ååãJupyterããŒãããã¯äžã§ãCSVãã¡ã€ã«ãèªã¿èŸŒã¿ãæ£ã°ã©ããæç»ããæ¹æ³ã玹ä»ãããä»åã¯ãCSVãã¡ã€ã«ã®ä»»æã®ããŒã¿ãåãåºããããããã«ããããããªã°ã©ããæç»ããæ¹æ³ã玹ä»ããã
人å£ã®å¢æžã確èªããã
ååãããPandasãšããã©ã€ãã©ãªãå©çšããŠãCSVããŒã¿ã®èªã¿èŸŒã¿ãšæç»ãè¡ã£ãŠãããPandasã¯ãéåžžã«åŒ·åãªããŒã¿åæã©ã€ãã©ãªã ããŸããååèŠããšãããCSVãã¡ã€ã«ãèªã¿èŸŒãã§ã°ã©ããæãã ããªããæ°è¡èšè¿°ããã ãã§äºè¶³ãããšãããæè»œããé åã ã
ããŠãä»åã¯ããå°ãPandasã®å®åã確èªããŠã¿ãããæåã«ãPandasã§ç¹å®ã®åãåãåºããŠã¿ããããã®ããã«ã¯ã以äžã®ããã«èšè¿°ããã
import pandas as pd
df = pd.read_csv("population.csv", encoding="SHIFT_JIS")
df["å¹³æ28幎"]
ããã°ã©ã ã«ããéããªã®ã ããpdf.read_csvã¡ãœããã§CSVãã¡ã€ã«ãèªã¿èŸŒããšãDataFrameåãšãã衚圢åŒã®ãªããžã§ã¯ããåŸãããã®ã ããdf["åå"]ã®ããã«æžãããšã§ãä»»æã®åãåãåºãããšãã§ããã
ãŸããåãåºããè€æ°ã®åã䜿ã£ãŠãè¡åæŒç®ãè¡ãããšãã§ãããæ¬¡ã«ãå¹³æ12幎ãšå¹³æ28幎ã§ãã©ãã ã人å£ã墿žããã®ãã調ã¹ãèšç®ãããŠã¿ããã
import pandas as pd
df = pd.read_csv("population.csv", encoding="SHIFT_JIS")
df["å¹³æ28幎"] - df["å¹³æ12幎"]
ãã®ããã«ãåãåºããåãšåããã€ãã¹æŒç®(-)ã§èšç®ããããšãååã®èŠçŽ ããšã®èšç®ãããŠçµæãåºåããŠãããã
ãšã¯èšãããã®ãŸãŸã§ã¯ããããèŸãã®ã§ãã©ãã«ãä»ããŠãã°ã©ãã§æç»ããŠã¿ããããããããã¹ãŠã®éœéåºçã«ã€ããŠã°ã©ãã«ããã®ã¯ãåé·ãªã®ã§ã人å£ãå¢ããããã10ãæãåºããŠæç»ãããã
%matplotlib inline
import pandas as pd
df = pd.read_csv("population.csv", encoding="SHIFT_JIS")
# 墿žã調ã¹ã --- (*1)
df['墿ž'] = df["å¹³æ28幎"] - df["å¹³æ12幎"]
# äžŠã³æ¿ã --- (*2)
df = df.sort_values(by=["墿ž"], ascending=False)
# äžäœ10äœãåŸã --- (*3)
top10 = df[0:10]
# ã°ã©ãã§æç» --- (*4)
top10.plot.bar(y=["墿ž"], x=["éœéåºç"])
top10
ãã¯ããæ±äº¬ãšç¥å¥å·ãé¡èã«å¢ããŠããããšãåããã°ã©ãã«ãªã£ãŠããããæ¬é¡ã®ããã°ã©ã ã«æ³šç®ããŠã¿ããã
ããã°ã©ã ã®(*1)ã®éšåã§ã人å£ã®å¢æžã調ã¹ãããã«ãå¹³æ28幎ãš12幎ã®å·®ãèšç®ãããå ã»ã©ç޹ä»ããããã«ãPandasã䜿ã£ãŠãåãåãåºãããšããããããã®èŠçŽ ã«ã€ããŠæŒç®ãè¡ãããšãã§ãããããã§ãæŒç®ããçµæãã墿žããšããåãäœã£ãŠä»£å ¥ããŠããããããã¯ãå ã®ããŒã¿ã«ã墿žããšããåã远å ããããšã«ãªããå®è¡çµæã®è¡šã確èªããŠã¿ãŠæ¬²ãããæåŸå°Ÿã®åã«å¢æžãšããåãå¢ããŠããããšãåããã
ãããŠãããã°ã©ã ã®(*2)ã§ã¯ã人å£ã®å¢å ãå€ãã£ãé ã«äžŠã³æ¿ããã(ïŒ3)ã®éšåã§ã¯ãäžäœ10ä»¶ãåãåºã倿°top10ã«ä»£å ¥ãããPandasã§ä»»æã®ç¯å²ã®è¡ãåãåºãã«ã¯ãdf[0:10]ã®ããã«èšè¿°ãããããã¯ã0è¡ç®ãã10è¡ç®-1(ã€ãŸãã9è¡ç®)ãŸã§ãåãåºããšããæå³ã«ãªãããã®ãããããã(0ããæ°ããŠ)2è¡ç®ãã8è¡åãåãåºããããã°ãdf[2:9]ãšèšè¿°ãããšè¯ããPythonã®é åã®ç¹åŸŽã ããçµç«¯ã®æå®ã-1ã«ãªããšããç¹ã¯æ³šæãå¿ èŠã ã
æåŸã«ã(*4)ã®éšåã§ãæ£ã°ã©ããæç»ãã衚ãåºåãããæ£ã°ã©ããæç»ããplot.barã¡ãœããã§ã¯ãx軞ãšy軞ã«ã©ã®ããŒã¿åãçšããããæç€ºçã«æå®ããããšã§ãæ£ããã°ã©ããæç»ã§ããããã«ãªã£ãŠããã
ãšããã§ãéã«äººå£ãæãæžã£ãŠããéœéåºçã¯ã©ãã ããããæ°ã«ãªã人ã¯ãèªåã§ããã°ã©ã ãæ¹è¯ããŠèª¿ã¹ãŠã¿ããããã³ãã ããäžã«æããã°ã©ã ãæžãæããã ãã§èª¿ã¹ãããšãã§ããã
æ£ã°ã©ããåãããããå å·¥ããã
ãšããã§ãå ã»ã©ã人å£å¢å ã®ããã10ã®äžã«ç³å·çããã£ããããããã°ã©ãã§ãªãã衚ãèŠãŠã¿ããšãç³å·çã¯ãã€ãã¹æé·ãšãªã£ãŠãããããã§ããã©ã¹æé·ãšãã€ãã¹æé·ã®å¢ç®ãããã£ãããšç€ºãã°ã©ããæžããŠã¿ããã
%matplotlib inline
import pandas as pd
df = pd.read_csv("population.csv", encoding="SHIFT_JIS")
# 墿žã調ã¹ãŠäžŠã³æ¿ãã
df['墿ž'] = df["å¹³æ28幎"] - df["å¹³æ12幎"]
df = df.sort_values(by=["墿ž"], ascending=False)
# ãã©ã¹ãšãã€ãã¹ã®äžéãæãåºã --- (*1)
mid = df[5:15]
# ã°ã©ãã®ã¹ã¿ã€ã«ã« ggplot ãå©çšãã --- (*2)
import matplotlib
matplotlib.style.use('ggplot')
# ã°ã©ãæç»
plt = mid.plot.bar(y=["墿ž"], x=["éœéåºç"])
# 0ã®ã©ã€ã³ã匷調 --- (*3)
plt.axhline(0, color='k')
å šäœãéããŠãååã®ããã°ã©ã ãšããã»ã©éããããããã§ã¯ãªããç°ãªãéšåã ãã«æ³šç®ããŠã¿ããã(ïŒ1)ã®éšåã§ã¯ã(0ããæ°ããŠ)5è¡ç®ãã14è¡ç®ãŸã§ãåãåºããŠããã
ä»åãæ³šç®ãããã®ã(ïŒ2)ãš(ïŒ3)ã®éšåã ã(ïŒ2)ã®ããã«æžããšãggplotãšããã¹ã¿ã€ã«ãé©çšããããçŸããã°ã©ããæç»ã§ããããã«ãªãããããŠã(ïŒ3)ã®éšåã§ã¯ãY軞ã®0ã®å¢çã匷調ããããã«æå®ããŠããã
人æ°ã®éœéåºçãåã°ã©ãã«ããŠã¿ãã
次ã«ããã®ããŒã¿ã§ææ°ã®å¹³æ27幎ãš28å¹Žãæ¯èŒããŠã人å£ãå¢å ããéœéåºçã®ããã5ãåã°ã©ãã§è¡šç€ºããŠã¿ããã
%matplotlib inline
import pandas as pd
import matplotlib
matplotlib.style.use('ggplot')
df = pd.read_csv("population.csv", encoding="SHIFT_JIS")
# 墿žã調ã¹å¢å é ã«äžŠã³æ¿ãã
df['墿ž'] = df["å¹³æ28幎"] - df["å¹³æ27幎"]
df = df.sort_values(by=["墿ž"], ascending=False)
# äžäœ5äœãæœåº
top = df[0:5]
# åã°ã©ãã§æç» --- (*1)
top["墿ž"].plot.pie(labels=top["éœéåºç"], autopct='%.0f')
ãããèŠããšãæ±äº¬ã®å¢å çããããããšãåããããããŠãæç¥ã»åŒçãšç¶ãã
åã°ã©ããæç»ããŠããã®ã¯ã(ïŒ1)ã®éšåã ããããŸã§ãplot.barãšæžããŠãããšããããplot.pieãšæžãæããããŸããå€ã«ã©ãã«ã貌ãã«ã¯ãåŒæ°ã«labelsãäžããããã«ããããŒã»ã³ããŒãžã瀺ããããšãã¯ãåŒæ°autopctãæå®ããã
ä»ã«ããline/barh/hist/box/kde/area/scatter/hexbinãªã©ã®ã°ã©ããæç»ã§ããã詳ããã¯ãPandasã®ããã¥ã¢ã«ã確èªããŠã¿ããã
ãŸãšã
ååã«åŒãç¶ããéœéåºçã®äººå£ããŒã¿CSVãå©çšããŠããããããªã°ã©ããæããŠã¿ãã
ããããã10è¡ååŸã®ããã°ã©ã ãªã®ã«ãç¥ãããéšåããºãããšç€ºã衚ãã°ã©ãã衚瀺ããããšãã§ãããããŒã¿è§£æã©ã€ãã©ãªã®Pandasãå©çšãããšãç°¡åã«è¡åæŒç®ãè¡ããã®ã倧ãããšæããã
PandasãšJupyterããŒãããã¯ã䜿ãããšã«ãã£ãŠãããæå³ã®ããã°ã©ããæç»ã§ããã®ã§ãæ¬çš¿ãåèã«ãããŒã¿åæãèŠèŠåã«ææŠããŠã¿ãŠæ¬²ããã
èªç±åããã°ã©ããŒããããã¯ãã©ã«ãŠãããã°ã©ãã³ã°ã®æ¥œãããäŒããæŽ»åãããŠããã代衚äœã«ãæ¥æ¬èªããã°ã©ãã³ã°èšèªããªã§ããã ãããã¹ã鳿¥œããµã¯ã©ããªã©ã2001幎ãªã³ã©ã€ã³ãœãã倧è³å ¥è³ã2004幎床æªèžãŠãŒã¹ ã¹ãŒããŒã¯ãªãšãŒã¿ã2010幎 OSSè²¢ç®è ç« åè³ãæè¡æžãå€ãå·çããŠããã




