æ©æ¢°åŠç¿ææ³ã«ããéã
ååã¯ãæãçŽæçã«ããããããæ±ºå®æšææ³ãçšããŠãæåž«ããåŠç¿ã®åé¡ãåãæ±ããŸãããæåž«ããåŠç¿ã®ã¢ãã«äœæããéåŠç¿ãªã©ã«è§Šããæåž«ããåŠç¿ã«ãããäœæ¥ããã»ã¹ãã€ã¡ãŒãžããããšãã§ãããšæããŸãã
ä»åã¯ãããã«äžæ©èžã¿èŸŒãã§ãææ³ã®éããçŽæçã«çè§£ããŠã¿ãŸãããããããŸã§ã¯ãæåž«ãªãåŠç¿ãK-Meansããæåž«ããåŠç¿/ååž°ãç·åœ¢ååž°ããæåž«ããåŠç¿/åé¡ãæ±ºå®æšããåãæ±ã£ãŠããŸããããå®éã«ã¯ãããããã®åŠç¿ã«å¯ŸããŠãå€ãã®æ©æ¢°åŠç¿ææ³ãååšããŸããå®éã«ãåãæåž«ããåŠç¿/åé¡ãæ±ãéã§ãã£ãŠããããã€ãã®ææ³ãåæã«è©Šããé©åãªææ³ãéžæããŸãã
ããã§ä»åã¯ãååãšåããæåž«ããåŠç¿/åé¡ã«é¢ããŠããæ±ºå®æšã以å€ã®ææ³ãšããŠãããžã¹ãã£ãã¯ååž°ããåãæ±ããææ³ã®éãã«è§ŠããŠããããšæããŸãã
ããŒã¿ã®æºå
ä»åã®ããŒã¿ã¯ååãšåãããconsumerPrices_tree.csvãã䜿çšããŸããåéœéåºçã®ç©äŸ¡ããŒã¿ã説æå€æ°ãšããŠãã倧éœåžåãæã€éœéåºçãªã®ããããã§ãªããããç®ç倿°ãšããŠåé¡ããŠãããŸãã
ãããŸã§ãšåæ§ã«ãããŒã¿ãæºåããJupyter Notebookãç«ã¡äžããŠãã ãããååããã§ã«ãconsumerPrices_tree.csvããããŠã³ããŒãããŠããæ¹ã¯ãããŒã¿ã®æºåã¯å¿ èŠãããŸãããJupyter Notebookãç«ã¡äžãã£ãããå³äžã®NewãããNotebookãéããã¿ã€ãã«åã倿ŽããŠãããŸããããä»åã¯ãClassificationModelsãšããååã«ããŸããã ãŸãã¯ããããŸã§ãšåæ§ã«ãããŒã¿ãèªã¿èŸŒãã§ã¿ãŸãã
import pandas as pd
data = pd.read_csv('consumerPrices_tree.csv')
data.head()
ååãšåãããŒã¿ãªãããå é 5è¡ã®ã¿ã衚瀺ãããšã©ãŒãåºãªãããšã確èªããŠããŸããåé¡ãªããã°ãæ©æ¢°åŠç¿ã«åããŠã説æå€æ°ãšç®ç倿°ã®ããŒã¿ãäœæããŸãããã
X = data.drop(['éœéåºç', '倧éœåžååé¡'], axis=1)
Y = data['倧éœåžååé¡']
ããã§ãæ©æ¢°åŠç¿ã«åããŠã®æºåãæŽããŸããã
æ±ºå®æšvsããžã¹ãã£ãã¯ååž° - åé¡çµæã®éã
ããã§ã¯ãææ³ã®éããèŠãŠãããŸãããããŸãã¯ãåå䜿çšãããæ±ºå®æšããšæ°ãã«ãããžã¹ãã£ãã¯ååž°ããçšããŠãåé¡çµæã«ã©ã®ãããªéããèŠãããããèŠãŠãããŸããããæ±ºå®æšã¯ãååããäŒãããããã«ãæããããã«åå²ã§ãã説æå€æ°ããã³ãã®æ¡ä»¶ãæ¢ãäœæ¥ããæšæ§é ç¶ã«æŽŸçãããŠããææ³ã§ããäžæ¹ãããžã¹ãã£ãã¯ååž°ã¯ãå€å€æ°(説æå€æ°)ã«å¯ŸããŠãããã«åå²ã§ããç·ãåŒãææ³ã§ããååž°ãšããååãã€ããŠããŸãããå®éã«ã¯åé¡åé¡ãè§£ãææ³ãªã®ã§æ··ä¹±ããªãããã«ããŠãã ããã
ä»åã¯ãèšç·Žã»ãã¹ãããŒã¿ã®åå²çã¯äžåèæ ®ããã«ãäºæž¬çµæã«ã©ããªéããããã®ããèŠãŠãããŸãããã
ãŸãã¯ãæ±ºå®æšã«ããåŠç¿ãè¡ããäœæãããæ©æ¢°åŠç¿ã¢ãã«ãçšããŠåé¡ãè¡ãªã£ãçµæã衚瀺ããŠã¿ãŸãã
from sklearn.tree import DecisionTreeClassifier
treeModel = DecisionTreeClassifier(max_depth=3, random_state=0)
treeModel.fit(X, Y)
predicted = pd.DataFrame({'TreePredicted':treeModel.predict(X)})
data_predicted = pd.concat([data, predicted], axis =1)
data_predicted.head()
ãã¡ãã®çµæãå é 5è¡ã®ã¿ã衚瀺ããŠããŸããååãšåãçµæã確èªã§ããŸããã§ããããã1è¡ç®ã§æ±ºå®æšã®åŒã³åºãã2è¡ç®ã§ã¢ãã«ã®æ¡ä»¶ãæå®ããŠããŸãã3è¡ç®ã§ãæåž«ããŒã¿ã®åŠç¿ãå®è¡ããæ©æ¢°åŠç¿ã¢ãã«ãäœæããŠããŸãã
äºæž¬çµæã¯ãtreeModel.predict(X)ãå®è¡ããããšã§ãä»åäœæããtreeModelã«ãããŠèª¬æå€æ°Xãäºæž¬ããçµæãååŸã§ããŸããèŠãããããããã«ãå ããŒã¿ã«è¿œå ãã圢ã§ããŒã¿ãæŽããŠããŸãããã¡ãã§ã¯å é è¡ã®ã¿ãæå®ããŸããããããŒã¿ã確èªããããæ¹ã¯ãheadãå€ããŠãå šããŒã¿ãèŠãŠã¿ããšè¯ãã§ãããã
ç¶ããŠãããžã¹ãã£ãã¯ååž°ãçšããäºæž¬ãè¡ããå ã»ã©ã®data_predictedã«è¿œå ããŠã¿ãŸãããã
from sklearn.linear_model import LogisticRegression
logisticModel = LogisticRegression()
logisticModel.fit(X, Y)
predicted = pd.DataFrame({'LogisPredicted':logisticModel.predict(X)})
data_predicted = pd.concat([data_predicted, predicted], axis =1)
data_predicted
ææ³ã¯éããŸãããäœ¿ãæ¹ã¯ãããŸã§ã®æµããšãŸã£ããåãã§ãããããŸã§ãããšãã ãã¶æ©æ¢°åŠç¿ã«æ £ããŠãããšæããã®ã§ã¯ãªãã§ããããã
1è¡ç®ã§æ©æ¢°åŠç¿ã©ã€ãã©ãªscikit-learn(sklearn)ã®ãªããããLogisticRegressionãã€ãŸãããžã¹ãã£ãã¯ååž°ãåŒã³åºããŠããŸãã2è¡ç®ã§logisticModelãšããååã§ããžã¹ãã£ãã¯ååž°ã䜿ã宣èšãããŠããŸããæ±ºå®æšã®å Žåã¯ãã¢ãã«ã®æ¡ä»¶ãæå®ããŸããããä»åã¯ãã¹ãŠããã©ã«ãã®ãã®ã䜿çšããã®ã§ãæå®ããã«å®çŸ©ããŠããŸãã3è¡ç®ã§ãå ã»ã©å®£èšããã¢ãã«ã«æåž«ããŒã¿ãä»£å ¥ããŠãåŠç¿ãå®è¡ããæ©æ¢°åŠç¿ã¢ãã«ãäœæããŠããŸããæ±ºå®æšã®ãšããšåæ§ã«ããã¢ãã«å.predict(äºæž¬ã«çšãã説æå€æ°)ãã§ãäºæž¬å€ãæœåºã§ããã®ã§ãlogisticModel.predict(X)ãïŒè¡ç®ã§æœåºããLogisPredictedãšããååã§ä¿åããŠããŸãã5è¡ç®ã§å ã»ã©ã®ããŒã¿ã«ãLogisPredictedåã远å ããŠããŸãã
ããŒã¿ãäžããé çªã«èŠãŠãããšãããžã¹ãã£ãã¯ååž°ã§ã¯ãäžçªäžã®åæµ·éã®äºæž¬ãå€ããŠããŸããããããéã«å®®åçã«ãããŠã¯ãæ±ºå®æšã§ã¯å€ããŠããŸãããããžã¹ãã£ãã¯ååž°ã®å Žåã¯åœãã£ãŠããŸãããã®ããã«ãããžã¹ãã£ãã¯ååž°ãšæ±ºå®æšã§ã¯ãåãæåž«ããŒã¿ãçšããŠããã®ã«ãããããããäºæž¬çµæã«éããåºãŠããããšãããããŸããã§ã¯ãã®éãã¯ã©ãã«ããã®ã§ããããããŸãã¯æ±ºå®æšææ³ã«ã€ããŠå¯èŠåãè¡ããªããèŠãŠãããŸãããã
æ±ºå®æšãçŽè§£ããŠãã
æ±ºå®æšã¯ãæããããã«1/0ãåé¡ãã倿°ãšãã®æ¡ä»¶ãèŠã€ããŠãæšæ§é ç¶ã«åé¡ãè¡ãªã£ãŠãããŸããä»åã®äŸã§ã¯ãããŸããŸæé€å𝿥œãä¿éºå»çã®ã¿ã§ããçšåºŠåé¡ãè¡ããããšãååã®çµæããããã£ãŠããŸããã§ã¯ãæé€å𝿥œãä¿éºå»çã®ããŒã¿ãå¯èŠåããŠã¿ãŸãããã
import matplotlib.pyplot as plt
%matplotlib inline
plt.scatter(data_predicted['ä¿éºå»ç'],data_predicted['æé€å𝿥œ'], c=data_predicted['倧éœåžååé¡'])
plt.xlabel('Medical care')
plt.ylabel('Culture & recreation')
é»è²ãã倧éœåžååé¡åã«ããã1ãã€ãŸã倧éœåžåãå«ãã éœéåºçã«ãªããŸãããããèŠããšãæé€å𝿥œã®å€ãé«ããã®ã¯ãã»ãŒå€§éœåžåã«åé¡ã§ããããšãããããŸããããã¯ãååã®æšæ§é ã§ç€ºããäžçªäžã®åå²ãã®ãã®ã§ããã§ã¯ãåååºåããæšæ§é ãããŒã¹ã«ãæ±ºå®æšãèŠãŠãããŸãããã
æ±ºå®æšã¯ãæããããã«åå²ã§ããæ¹æ³ãæ¢ããŠãããŸãããã®å Žåã倧éœåžåã§ããClass1ãã©ãããæããããã«åå²ããæ¹æ³ããåå²1ã®æé€å𝿥œã99.85以äžãã©ããã§ãã99.85ãã倧ããé åã§ã¯ããã¹ãŠClass1ã«åããããŸããæ¬¡ã«ãä¿éºå»çã99.35以äžãã©ããã§ã°ã©ãã®æãå·ŠåŽã®é åãClass0ãšããŠåããããŸããããã«ãä¿éºå»çã100.25ããé«ãé åãClass0ãšããŠåé¡ããŠããŸããä¿éºå»çã100.25以äžã®é åã¯ãClass0ã4åãClass1ã5åãšãªãã粟床ãè¯ãåé¡ãšã¯èšããŸãããããã«éå±€ãå¢ããããšã§ããããã«åå²ããããšã¯ã§ããŸãããéåŠç¿ã«æ³šæããªããŠã¯ãªããŸããã
æåŸã«ãäºæž¬çµæãå¯èŠåããŠãããŸãããã
plt.scatter(data_predicted['ä¿éºå»ç'],data_predicted['æé€å𝿥œ'], c=data_predicted['TreePredicted'])
plt.xlabel('Medical care')
plt.ylabel('Culture & recreation')
å¯èŠåã¯ãå ã»ã©ãšã»ãŒåãã§ãããè²åãã®æå®ã«TreePredictedãäžããŠããŸããããã«ãã£ãŠãæ±ºå®æšã«ãã£ãŠäºæž¬ããåé¡çµæãè²åãããŠè¡šç€ºããŠããŸãããã®å³ãèŠããšãå ã»ã©ã®åå²ã«åºã¥ããŠåé¡ãããŠããã®ãããããŸãã
æ©æ¢°ã«ããåé¡ããã»ã¹ãçè§£ã§ããŸããã§ãããããæ¯èŒççŽæçã§ãå Žååãã«ãã£ãŠçްããåå²ããŠããã®ãæ±ºå®æšã®åé¡ããã»ã¹ã§ãããã®ãããä¿éºå»çã99.35以äžã100.25ãã倧ããéšåã¯Class0ãšãããããªé£ã³å°ã®ãããªæ¡ä»¶ãäœãåºããŠãããŸãã
ããžã¹ãã£ãã¯ååž°ãçŽè§£ããŠãã
ããŠãç¶ããŠããžã¹ãã£ãã¯ååž°ãçŽè§£ããŠãããŸãããããã¡ãã¯ãæ±ºå®æšã»ã©çŽæçã§ã¯ãªãã®ã§ãããåå²ã§ããçŽç·ãåŒããšããã®ã¯åããªã®ã§ãé ã®çé ã«å ¥ããŠãããŠãã ããããŸãã¯ãæ±ºå®æšã§äœ¿çšããä¿éºå»çãæé€å𝿥œã®æ£åžå³ãããããããããžã¹ãã£ãã¯ååž°ã«ããåé¡ãããšã«è²åãããŠã¿ãŸãããã
plt.scatter(data_predicted['ä¿éºå»ç'],data_predicted['æé€å𝿥œ'], c=data_predicted['LogisPredicted'])
plt.xlabel('Medical care')
plt.ylabel('Culture & recreation')
è²åãã®å€æ°ã«LogisPredictedãäžããŠãã以å€ã¯ãå ã»ã©ãšãŸã£ããåãã§ãããããèŠããšãæ±ºå®æšã«ããåãããããã®ãšã¯å€§ããéããŸããç¹ã«ãæ±ºå®æšã§è¡šçŸã§ããŠããé£ã³å°éšåã¯èŠãããã倧ããã¯äžäžã«åå²ããŠããããã«èŠããŸããããžã¹ãã£ãã¯ååž°ãæ±ºå®æšãšåæ§ã«ç·ãåŒãäœæ¥ã§ã¯ãããŸããããã¡ãã¯èª¬æå€æ°ãã¹ãŠãèæ ®ãã圢ã§ãåå²ã«é©ããç·ãå°ãåºããŸããããã¯ãç·åœ¢ååž°ãã€ã¡ãŒãžãããšè¯ãã§ããããå®ã¯ãããžã¹ãã£ãã¯ååž°ã¯ãç·åœ¢ååž°ã®é¢æ°Y=aX1+bX2+zã0ãã1ã®ç¯å²ã«æŒã蟌ããŠããŸã颿°ã䜿çšãã0/1ã®åé¡ãè¡ãªã£ãŠããŸãããã®ãããç·åœ¢ååž°ã®ãšããšåæ§ã«ã2倿°ã§ã¯ãããã«è¡šçŸã§ããŸãããä»åã®ã±ãŒã¹ã®ããã«10倿°ã§ã¯ã€ã¡ãŒãžã§ããŸãããããã§ã2倿°ã§ããžã¹ãã£ãã¯ååž°ãå®è¡ããŠãå¯èŠåããŠã¿ãŸãããã
粟床ã¯äžåºŠèæ ®ããããã®ãŸãŸä¿éºå»çãæé€å𝿥œã䜿ã£ãŠãããŸãã
X_logis = X[['ä¿éºå»ç','æé€å𝿥œ']]
logisticModel2 = LogisticRegression()
logisticModel2.fit(X_logis, Y)
predicted = pd.DataFrame({'LogisPredicted2':logisticModel2.predict(X_logis)})
logis_predicted = pd.concat([X_logis, predicted], axis =1)
logis_predicted.head()
1è¡ç®ã§ã説æå€æ°ãšããŠä¿éºå»çãæé€å𝿥œã®2倿°ãåãåºããŠããŸãã2è¡ç®ä»¥éã¯ãå ã»ã©ãšåæ§ã«ãããžã¹ãã£ãã¯ååž°ã®ã¢ãã«å®çŸ©ããåŠç¿ãè¡ãªã£ãŠããŸããæåŸã«ãä»åã®äºæž¬çµæãå¯èŠåããããã«ãããŒã¿ãæŽããŠããŸãã
ç¶ããŠå¯èŠåãè¡ããŸããä»åã¯ãæ£åžå³ã«å ããŠãäœæããæ©æ¢°åŠç¿ã¢ãã«ããç®åºã§ããåå²ç·ã衚瀺ããŠã¿ãŸãã
import numpy as np
plt.scatter(logis_predicted['ä¿éºå»ç'],logis_predicted['æé€å𝿥œ'], c=logis_predicted['LogisPredicted2'])
plt.xlabel('Medical care')
plt.ylabel('Culture & recreation')
a = logisticModel2.coef_[0,0]
b = logisticModel2.coef_[0,1]
z = logisticModel2.intercept_[0]
x = np.arange(97,104,1)
plt.plot(x,(-a*x-z)/b)
å¯èŠåãè¡ããšãæ±ºå®æšãšéãé£ã³å°éšåã¯ãªããç·ã«ãã£ãŠäžäžã«åå²ãããŠããŸãã1è¡ç®ã¯ãnupmyãšããæ°å€èšç®çšã©ã€ãã©ãªãimportããŠããŸãã2ãã4è¡ç®ãŸã§ã¯æ£åžå³ã®å¯èŠåã§ãè²åãã«é¢ããŠã¯ã2倿°ã«ãã£ãŠäœæããæ©æ¢°åŠç¿ã¢ãã«ã®äºæž¬çµæãæå®ããŠããŸããæ®ãããä»åã®ã¢ãã«ããç®åºããåå²ç·ã®å¯èŠåã§ãã现ããæ°åŠçãªèª¬æã¯å²æããŸãããç·åœ¢ååž°åŒã®Y = aX1+ bX2 + zãã€ã¡ãŒãžããY=0ãšããŠå€åœ¢ãããšã8è¡ç®ã®åŒã«ãªããŸããaãbãzã®å€ã¯ã5ãã7è¡ç®ã§æ©æ¢°åŠç¿ã¢ãã«ããæœåºããŠããŸããç·åœ¢ååž°ã®ãšããšåæ§ããã®aãbãzãç·ãåŒãããã«å¿ èŠãªæ°å€ã§ãããããæåž«ããŒã¿ããç®åºããããšãç·ãåŒã(=åŠç¿)ããšãšå矩ã§ãã
æ±ºå®æšã®ããã«æ¡ä»¶åå²ã«ãã£ãŠç·ãåŒããŠããã®ã§ã¯ãªãããããŸã§ãå€å€æ°ã®åŒãå ã«åå²ç·ãåŒãã®ãããžã¹ãã£ãã¯ååž°ã§ãã2倿°ã§ããã°çŽç·ã3倿°ã§ããã°åå²ããå¹³é¢ãæžãããšã«ãªããŸããããã ãèããšæ±ºå®æšã®æ¹ããã¹ããŒãã«åå²ã§ãããããªé¯èŠã«é¥ãã®ã§ãããããžã¹ãã£ãã¯ååž°ã¯ãå€å€æ°ãäžæã«æ±ããããããšãŠãæå¹ãªææ³ã§ããèšç·Žã»ãã¹ãããŒã¿ãåå²ããŠæ£ç¢ºã«ã¢ãã«ç²ŸåºŠãæ€èšŒããªããšåªå£ã¯ã€ããããŸããããä»åäœæããæ±ºå®æšãšããžã¹ãã£ãã¯ååž°ã¢ãã«ã«ãããåçŽãªæ£ççã¯ãã»ãŒåãå€ã瀺ããŠããŸãã
ããŠãä»åã¯ãæåž«ããåŠç¿/åé¡ã«é¢ããŠãååãããããã«äžæ©é²ãã§ãæ©æ¢°åŠç¿ææ³ã®éããçŽè§£ããŠããŸãããæ°åŠçãªèª¬æã¯å²æããŸããããææ³ã®éããã©ããã£ããã®ãªã®ããçŽæçã«ã€ã¡ãŒãžã§ããŸããã§ãããããä»ååãäžããæ±ºå®æšãããžã¹ãã£ãã¯å垰以å€ã«ãå€ãã®ææ³ãååšããããã«ã¯å¿ ãæå³ããããŸãããããã®éããçè§£ããããŒã¿ã«åãããŠé©åã«ææ³ãéžæããŠããã®ãéèŠã§ããå°ããã€ããŸããŸãªææ³ã«ææŠãããã®éããçŽè§£ããªããçè§£ãæ·±ããŠãããšè¯ããšæããŸãã
次åã¯ããããŸã§ã®åŠã³ã掻ãããªãããè€éãªããŒã¿ãçšããŠæ©æ¢°åŠç¿ã«ææŠããŠãããŸãã
èè ãããã£ãŒã«
äžå±±èŒæ
倧æé»æ©ã¡ãŒã«ãŒã«ãŠãããŒããŠã§ã¢ã®ç ç©¶éçºã«åŸäºããåŸãç¬ç«ãç¬ç«åŸã¯ãœãããŠã§ã¢ãããŒã¿åæçã«ãããŠå®åçµéšãç©ããšãšãã«ãæ°ç€Ÿãå ±å嵿¥ããã®äžã§ãååäŒç€Ÿã¢ã€ãã¥ããŒã¿ã§ã¯ã人工ç¥èœã»IoTãªã©ã®å¯èœæ§ãæ¹åæ§ãç ç©¶ããŠãããæè¿ã§ã¯ããªãŒãã³ããŒã¿ã«çç®ãããªãŒãã³ããŒã¿æŽ»çšã®ããã®webãµãŒãã¹ã®ç«ã¡äžãããªãŒãã³ããŒã¿ÃIoTã«ãã䟡å€åµåºã1ã€ã®ããŒãã«åãçµãã§ããã








