è±åèªã§æ£èŠè¡šçŸããã¹ã¿ãŒããã
ãæ£èŠè¡šçŸããšããèšèãèããããšãããã ããããæ£èŠè¡šçŸã¯ãšãŠã䟿å©ãªã®ã§ãããã°ã©ããŒã§ãªããŠããã¹ã¿ãŒããŠããããæ©èœã ãå€ãã®ããã¹ããšãã£ã¿ã«æ£èŠè¡šçŸã䜿ã£ãçœ®ææ©èœãããã®ã§ãæ£èŠè¡šçŸãèŠãããšãPythonã ãã§ãªãããããå Žé¢ã§æŽ»çšã§ãããä»åã¯ãè±åèªèŸæžã®ããŒã¿ãå©çšããŠãæ£èŠè¡šçŸããã¹ã¿ãŒãããã
è±åèªã§æ£èŠè¡šçŸãšã¯ã©ãããããšãïŒ
ãæ£èŠè¡šçŸããšã¯ãæååã®éåãç¹æ®ãªã¡ã¿æåãå©çšããŠè¡šçŸããæ¹æ³ã ãæ£èŠè¡šçŸã«äŒŒã衚çŸã«ãã¯ã€ã«ãã«ãŒããããããã¯ã€ã«ãã«ãŒãã§ã¯ã*.txtãã®ãããªãã¿ãŒã³ãèšè¿°ããããšã§ããabc.txtãããæžé¡.txtããªã©ã衚ãããšãã§ãããããã¯äž»ã«ããã¡ã€ã«æ€çŽ¢ãªã©ã§ãæ¡åŒµåã.txtã(ããã¹ããã¡ã€ã«)ãåæããã®ã«åœ¹ç«ã€ãã®ã ããããŠãæ£èŠè¡šçŸã¯ããã®ã¯ã€ã«ãã«ãŒããäœååã䟿å©ã«ãããã®ãšèšãããšãã§ããã
ãšããã§ãåã åãããŒã¿ããŒã¹ã®æäœèšèªSQLãåŠç¿ããã®ã«è±åèªããŒã¿ããŒã¹ãå©çšãããããªãŒã®è±èªèŸæžãå©çšããããšã§ãããŸããŸãªè±åèªã衚瀺ããããšãã§ããŠæ¥œããã£ãã®ã§ã¯ãªãã ããããä»åã¯ãæ£èŠè¡šçŸããã¹ã¿ãŒããããã«ãè±åèªèŸæžã®ããŒã¿ãå©çšããŠã¿ãããSQLã®æãšåããããšãŠã楜ããæ£èŠè¡šçŸãåŠã¶ããšãã§ãããä»åã¯ãSQLããŒã¿ããŒã¹ã§ã¯ãªããããã¹ãããŒã¿ã®èŸæžã䜿ãã
ãã¡ããããããã¹ã圢åŒã®èŸæžããŒã¿ãããŠã³ããŒããããããããŠãZIPãã¡ã€ã«ãè§£åãããšãejdic-hand-utf8.txtããšãããã¡ã€ã«ãã§ãããããããããã¹ã圢åŒã®èŸæžããŒã¿ã ãããããPythonããèªã¿èŸŒãã§äœ¿ã£ãŠã¿ããã
Python3ãã€ã³ã¹ããŒã«ããç¶æ ã§ãã³ãã³ãã©ã€ã³ãèµ·åããããWindowsãªãã³ãã³ãããã³ãããPowerShellãmacOSãªãã¿ãŒããã«.appãèµ·åãããããã®åŸãã³ãã³ãã©ã€ã³ããã察話åå®è¡ç°å¢REPLãèµ·åãããã
# --- Windowsã®å Žå
python
# --- macOSã®å Žå
python3
Pythonã®å¯Ÿè©±åå®è¡ç°å¢ãèµ·åãããã以äžã®Pythonããã°ã©ã ãå®è¡ããŠãèŸæžãã¡ã€ã«ãèªã¿èŸŒãããïŒãªããã>>>ãã¯ãPythonã«ããã°ã©ã å ¥åå¯èœãªããšã衚ãèšå·ã§ãå ¥åããå¿ èŠã¯ãªããïŒäžåºŠã«ã¡ã¢ãªå ã«èŸæžããŒã¿ãèªã¿èŸŒãã®ã§ãå€ãPCã§è©Šããšãå®è¡ã«éåžžã«æéããããå¯èœæ§ãããããã®å Žåã«ã¯ãColaboratoryãªã©ã¯ã©ãŠãã®ç°å¢ã§è©Šãããšãã§ããã ããã以äžãå®è¡ãããšãèŸæžãã¡ã€ã«ãã¡ã¢ãªã«èªã¿èŸŒãã§å€æ°txtã«ä»£å ¥ããã
>>> txt = open("ejdic-hand-utf8.txt", "rt").read()
次ã«èªã¿èŸŒãã èŸæžããŒã¿ãæ¹è¡ã§åºåã£ãŠã¿ããã
>>> lines = txt.split("\n")
ãã®èŸæžããŒã¿ã¯ãäžè¡äžããŒã¿ã«ãªã£ãŠãããlen()颿°ã䜿ããšãèŸæžã®ããŒã¿æ°ã調ã¹ãããšãã§ãããå®è¡ãããšã46726èªã®åèªãããããšãåããã
>>> len(lines)
46726
ç¶ããŠãèŸæžããåèªéšåã ããæœåºãããã以äžã®ããã«èšè¿°ãããšãèŸæžããè±åèªã ããåãåºãããªã¹ãwordsãäœæããã
>>> words = list(map(lambda s: s.split("\t")[0], lines))
ãããŠãæ£èŠè¡šçŸããã¹ãããããã®é¢æ°ãå®çŸ©ããããä»åããã®é¢æ°ãå©çšããŠæ£èŠè¡šçŸã詊ãã®ã§éèŠã ããã®é¢æ°ã¯ãè±åèªãªã¹ãwordsããæå®ã®æ£èŠè¡šçŸã«åèŽããåèªã®ã¿ãè¿ããšãããã®ã ãã©ã ã颿°ãfilter()颿°ãªã©ã䜿ã£ãŠããã®ã§ã¡ãã£ãšè€éã«èŠãããããããªããããããååèªããšã«æ£èŠè¡šçŸæ€çŽ¢re.search()ãå®è¡ããŠãæ€çŽ¢ã«ãããããã°ããã®åèªãè¿ãã
>>> import re
>>> m = lambda pat: list(filter(lambda w: 1 if re.search(pat, w) else 0, words))
ããã§ã¯ãããããæ£èŠè¡šçŸã詊ããŠã¿ããã
è±åèªãæ€çŽ¢ããã
ããã§ã¯è±èªèŸæžããä»»æã®åèªãæœåºããŠã¿ããããŸãã¯ã以äžã®ããã°ã©ã ãå®è¡ããŠãlowãæã€åèªãããã¯ã¢ããããŠã¿ããã
>>> m(r"low")
ãããšã以äžã®ããã«owãå«ãããããã®åèªãåæãããã ããããã®ããã«ãæ£èŠè¡šçŸæ€çŽ¢re.search()颿°ã䜿ããšãæ€çŽ¢å¯Ÿè±¡ã®äžéšã«ããããããã®ãããã°ããããè¿ãããªããr"æåå" ã®ããã«æžããšããšã¹ã±ãŒãæåã®ã\ããªã©ããšã¹ã±ãŒãæåãšæ±ããªãããã«ãªããæ£èŠè¡šçŸãèšè¿°ããéã«äŸ¿å©ãªã®ã§ãèŠããŠãããã
æ«å°Ÿã«ãããããèªå¥ã調ã¹ããå Žåã®ã$ã
æ£èŠè¡šçŸã§ã¯ç¹æ®ãªæå³ãæã€ãã¡ã¿æåããšåŒã°ããç¹æ®ãªèšå·ãæå®ããããšã§ããã¿ãŒã³ã«æå³ãæãããããšãã§ãããäŸãã°ã$ãèšå·ã¯æ«å°Ÿãæå³ããã¡ã¿æåã ããlow$ãã®ããã«æå®ããã°ããallowããglowããªã©åèªã®æ«å°Ÿãlowã§çµããåèªã®ã¿ãããã¯ã¢ããã§ããã詊ããŠã¿ããã
>>> m(r"low$")
å®è¡ãããšã次ã®ããã«è¡šç€ºãããã
è¡é ã«ãããããèªå¥ã調ã¹ããå Žåã®ã^ã
éã«ãlowããå§ãŸãèªå¥ã調ã¹ããå Žåã«ã¯ã^ããå©çšããŠã^lowããšèšè¿°ããã以äžã®ãã¿ãŒã³ã詊ããŠã¿ããã
>>> m(r"^low")
å®è¡ãããšã以äžã®ããã«ãlow tide(干朮)ãããlowery(é°æ°)ããªã©ã®åèªã衚瀺ãããã
ä»»æã®äžæåã衚ãã.ã
次ã«ãä»»æã®äžæåã衚ãã.ãã䜿ã£ãŠã¿ãããå
ã»ã©ç޹ä»ããè¡é ã®ã^ããè¡æ«ã®ã$ããšçµã¿åãããŠãã^a.t$ãã®ããã«èšè¿°ãããšãaããã¯ããŸãtã§çµãã3æåãæœåºããããšãã§ããã
>>> m(r"^a.t$")
å®è¡ãããšãéè€ããããã以äžã®ããã«ãant(ã¢ãª)ãããart(èžè¡)ããªã©ãåæãããã
['act', 'aft', 'ant', 'ant', 'apt', 'art', 'art']
å
šãåãã ããaããã¯ããŸãtã§çµãã5æåã®åèªãåæããã«ã¯ã^a...t$ããšèšè¿°ããããããšãabout(ãã«ã€ããŠ)ãããalert(èŠå)ããåæãããã
>>> m(r"^a...t$")
ãã®ããã«ä»»æã®äžæåãèŠãããšãé¢çœãæ€çŽ¢ãå¯èœã§ããããšãåããã ããã
1æå以äžã®ç¹°ãè¿ãã+ããš0æå以äžã®ç¹°ãè¿ãã*ã
次ã«ã1æå以äžã®ç¹°ãè¿ãã衚ãã+ãã«ã€ããŠèŠãŠã¿ãããäŸãã°ããa+ãã®ããã«æžããšãaaãããaaaãããaaaaaããªã©ããããããããf+ãã®ããã«æžããšããffãããffffãããããããã
ãããŠãä»»æã®äžæåã.ããšçµã¿åããããšãä»»æã®äžæå以äžãšããæå³ã«ãªãããã®ãããoããå§ãŸã£ãŠwã§çµããåèªã調ã¹ããå Žåã«ã¯ãão.+wããšãããã¿ãŒã³ãæå®ããã詊ããŠã¿ããã
>>> m(r"^o.+w$")
ãããšããoverflow(ããµãã)ãããovergrow(倧ãããªãããã)ããªã©ã®åèªãèŠã€ããã ããã
ãŸããã+ããšäŒŒãã¡ã¿æåã«ã¯ã*ãããããããã¯ã0æå以äžã®ç¹°ãè¿ãã衚ããäžèšã®ããã«æžããšãã^o.+w$ãã§åæããåèªã«å ããŠããow(æ¿çã衚ããŠ)ãŠãŠãããåæããã
>>> m(r"^o.*w$")
ãŸãšã
ãã®ããã«ãæ£èŠè¡šçŸã䜿ããšããšãŠã现ããæ€çŽ¢æ¡ä»¶ãæå®ããããšãã§ãããããã§ç޹ä»ããæ£èŠè¡šçŸã¯ã代衚çãªãã®ã§ãããä»ã«ã䟿å©ãªæ©èœããããããããåã¡ã¿æåã®æå³ãçè§£ããããã«ããããããªãã¿ãŒã³ãæå®ããŠããã®çµæã確èªããŠã¿ããããããããªããæ£èŠè¡šçŸãããæ·±ãçè§£ããããšãã§ããã ãããPythonã®å ¬åŒããã¥ã¢ã«ã«ã¯ãå©çšã§ããæ£èŠè¡šçŸã®äžèЧãèŒããããŠããã®ã§ãããã¥ã¢ã«ãåèã«ããããããªãã¿ãŒã³ã詊ããŠã¿ããšè¯ãã ããã
èªç±åããã°ã©ããŒããããã¯ãã©ã«ãŠãããã°ã©ãã³ã°ã®æ¥œãããäŒããæŽ»åãããŠããã代衚äœã«ãæ¥æ¬èªããã°ã©ãã³ã°èšèªããªã§ããã ãããã¹ã鳿¥œããµã¯ã©ããªã©ã2001幎ãªã³ã©ã€ã³ãœãã倧è³å ¥è³ã2004幎床æªèžãŠãŒã¹ ã¹ãŒããŒã¯ãªãšãŒã¿èªå®ã2010幎 OSSè²¢ç®è ç« åè³ãæè¡æžãå€ãå·çããŠããã



