æ¥ã ããœã³ã³ãå©çšããŠãããšãç¥ããç¥ããã®éã«ãåããã¡ã€ã«ãããã€ãäœã£ãŠããŸããã¡ã ããã¡ã€ã«åãéãã ãã§ãå 容ãåããã¡ã€ã«ãè€æ°ååšãããªãããã£ã¹ã¯ã¹ããŒã¹ã®ç¡é§ã§ããã ãã§ãªãããã¡ã€ã«ãæ¢ãã®ã«æéãããããäœæ¥å¹çãèœã¡ãŠããŸãããšã ãããããã§ãä»åã¯ãPythonãå©çšããŠãé«éã«éè€ãã¡ã€ã«ãåé€ããããŒã«ãäœã£ãŠã¿ããã
éè€ãã¡ã€ã«ã®èŠã€ãæ¹
éè€ãã¡ã€ã«ãã©ã®ããã«èŠã€ãããè¯ãã ããããåœç¶ã ããäºã€ã®ãã¡ã€ã«ãèªã¿èŸŒãã§ã¿ãŠããã®å 容ãåããã©ãããæ¯èŒãããªãããã®ãã¡ã€ã«ãéè€ããŠãããã©ãããèŠåããããšãã§ããã
äŸãã°ã100åã®ãã¡ã€ã«ããããªãããã®100åã®ãã¡ã€ã«ã®äžã€ãã€ã«ã€ããŠãä»ã®99åã®ãã¡ã€ã«ãšæ¯èŒããŠå 容ãåããã©ããã調ã¹ããªããéè€ãã¡ã€ã«ãèŠã€ããããšãã§ãããããã¯ãéåžžã«æçŽãªããæ¹ã ããä»çµã¿ãåçŽãªã®ã§ããŸãã¯ããã®æ¹æ³ã§ããã°ã©ã ãäœã£ãŠã¿ããã
ããã§ãããã°ã©ã ãäœæããã®ã«éããŠããããšè€æ°ã®éè€ãã¡ã€ã«ãäœæããcheckãšãããã£ã¬ã¯ããªã«ã³ããŒããŠãããããããŠãJupyter Notebookã®èµ·åãã¹ã«ã³ããŒãããã
ãã®åŸãJupyter Notebookãèµ·åããŠã以äžã®ããã°ã©ã ãå ¥åããŠã¿ããããã¡ããããã¹ããã¡ã€ã«ãæºåããªããŠããããã°ã©ã ã®(*1)ã®éšåãé©åœãªãã¹ã«å€ããŠè©Šãããšãã§ããã ããã
import os, glob
# éè€ãã¡ã€ã«ããããã©ããã調ã¹ããã£ã¬ã¯ã㪠--- (*1)
target_dir = './check'
# ãã¡ã€ã«ã®äžèЧãåŸã --- (*2)
files = glob.glob(target_dir + "/*")
# ç¹°ãè¿ãéè€ãããã調ã¹ã --- (*3)
for f1 in files:
with open(f1, "rb") as f1p:
f1body = f1p.read() # å
容 --- (*4)
# ç¹°ãè¿ã調ã¹ã --- (*5)
for f2 in files:
# åäžãã¡ã€ã«ãªãæ¯èŒããªã --- (*6)
if f1 == f2: continue
# ãã¡ã€ã«ã®å
å®¹ãæ¯èŒ --- (*7)
with open(f2, "rb") as f2p:
f2body = f2p.read()
if f1body == f2body:
print(f1, "==", f2)
print("ok")
ããã°ã©ã ãå®è¡ããéè€ãã¡ã€ã«ããããšã以äžã®ããã«ãåäžå 容ã®ãã¡ã€ã«ã®äžèЧã衚瀺ããã
ããã°ã©ã ã確èªããŠã¿ããã(ïŒ1)ã®éšåã§ã¯ãéè€ãã¡ã€ã«ããããã©ããã調ã¹ããã£ã¬ã¯ããªãæå®ããã(ïŒ2)ã§ã¯ãglob.glob()颿°ãå©çšããŠãæå®ãã£ã¬ã¯ããªã®ãã¡ã€ã«äžèЧãååŸããã
(ïŒ3)ã®éšåã§ã¯ããã¡ã€ã«ã®äžèЧã«ã€ããŠãäžã€ãã€éè€ãã¡ã€ã«ããããã©ããã調ã¹ãŠããã(ïŒ4)ã§ã¯ããã¡ã€ã«ãéããŠf1bodyã«å 容ãèªã¿èŸŒãã
ãããŠã(ïŒ5)以éã®éšåã§ã¯ãæ¯èŒå¯Ÿè±¡ã®ãã¡ã€ã«äžèЧãäžã€ãã€ãå埩ãããã€ãŸãã(ïŒ3)ã®foræã®å¯Ÿè±¡ãã¡ã€ã«f1ãšã(ïŒ5)ã®foræã®å¯Ÿè±¡ãã¡ã€ã«f2ã§ãåäžãã¡ã€ã«ãã©ããã調ã¹ãã(ïŒ6)ã§ã¯ãå šãåããã¡ã€ã«ãå埩ããªãããã«èæ ®ãã(ïŒ7)ã®éšåã§ã¯ããã¡ã€ã«ã®å å®¹ãæ¯èŒããŠãåäžãã©ããã調ã¹ãããããåäžã§ããã°ããã®æšãprint()ã§åºåããã
é«éã«éè€ãã¡ã€ã«ãæ€çŽ¢ããã
ãã£ã¬ã¯ããªå ã®ãã¡ã€ã«ãå°ãªãæã¯ãäžèšã®ãããªæçŽãªããæ¹ã§ããå šãåé¡ãªãã ããããããããã¡ã€ã«æ°ãå€ããšãã«ã¯ãéåžžã«åŠçã«æéãããã£ãŠããŸãããšããã®ããäŸãã°ã100åã®ãã¡ã€ã«ã«ã€ããŠæ¯èŒãããªããå€åŽã®foræã§100åãå åŽã®foræã§100åã100Ã100ã§åèš1äžåãforæãåãããšã«ãªãããã ã
ããã§ãforæãéããªãããã«ããæ¹æ³ãèããŠã¿ãããããã¯ãããã·ã¥å€ãšèŸæžåã®ããŒã¿ã䜿ãããæ¹ã ããã¡ã€ã«ã®äžèЧãåŸãŠãäžã€ãã€ãã¡ã€ã«ãææ»ããã®ã¯åãã ãããã¡ã€ã«ãéãããšãã«ããã®å 容ã®èŠçŽã§ããããã·ã¥å€ãèšç®ãããããèŸæžåã®å€æ°ã«èŠããŠãããããããŠãéå»ã«åãããã·ã¥å€ãæã€ãã¡ã€ã«ãããã°ãããã¯ãéè€ãã¡ã€ã«ãšããããšã«ãªãã
ã¡ãªã¿ã«ãããã·ã¥å€ïŒãããã¯ãã€ãžã§ã¹ãå€ïŒãšããã®ã¯ãããŒã¿ã®åãæž¡ããä¿ç®¡ã®éã«ããã®ããŒã¿ãæ¹å€ãããŠããªãã確èªããããã«äœ¿ããããã®ã ãããããŒã¿ãäžããããå Žåã«ãã®ããŒã¿ãèŠçŽããå€ãããã·ã¥å€ãšåŒã¶ãããŒã¿ãç°ãªãã°ããã·ã¥å€ãç°ãªãããã«å·¥å€«ãããŠããã
ãã®ããæ¹ã§ããã°ãããã·ã¥å€ãèšæ¶ãããããã¡ã¢ãªã¯ãããªãã«æ¶è²»ãããã®ã®ãforæãéããå¿ èŠããªããããé«éã«éè€ãã¡ã€ã«ã調ã¹ãããšãã§ããã
ããã§ã¯ãå®éã®ããã°ã©ã ãèŠãŠã¿ããã
import os, glob, hashlib
# éè€ãã¡ã€ã«ããããã©ããã調ã¹ããã£ã¬ã¯ããª
target_dir = './check'
body_dict = {}
# ãã¡ã€ã«ã®å
容ãè¿ã颿° --- (*1)
def get_body(fname):
with open(fname, "rb") as f:
return f.read()
# ãã¡ã€ã«ã®äžèЧãåŸãŠéè€ãããã調ã¹ã --- (*2)
files = glob.glob(target_dir + "/*")
for f in files:
# ãã¡ã€ã«ãéããŠããã·ã¥å€ã調ã¹ã --- (*3)
body = get_body(f)
v = hashlib.sha256(body).hexdigest()
if v in body_dict: # éè€ããŠããã --- (*4)
f2 = body_dict[v]
# 念ã®ããå®éã«åèŽããŠããã調ã¹ã --- (*5)
if body == get_body(f2):
print(f, "==", f2)
# å®éã«åé€ãããªã以äžã®ã³ã¡ã³ããå€ã ---- (*5a)
# os.remove(f)
else:
body_dict[v] = f # --- (*6)
print("ok")
ããã°ã©ã ã®å®è¡çµæã¯ã»ãšãã©åããªã®ã§ãããã°ã©ã ã®å 容ã確èªããŠããããããã°ã©ã ã®(ïŒ1)ã®éšåã§ã¯ããã¡ã€ã«ã®å 容ãèªã¿èŸŒãã§å 容ãè¿ãget_body()颿°ãå®çŸ©ããã(ïŒ2)ã®éšåã§ã¯ããã¡ã€ã«äžèЧãåŸãŠäžã€ãã€èª¿ã¹ãŠããã(ïŒ3)ã§ã¯ããã¡ã€ã«ãéããŠããã·ã¥å€ã調ã¹ããããã·ã¥å€ã調ã¹ãã«ã¯ãhashlib.sha256()颿°ã䜿ãããã®é¢æ°ã¯ãSHA-256ãšããã¢ã«ãŽãªãºã ãçšããŠããã·ã¥å€ãèšç®ãããã®ã ã(ïŒ4)ã®éšåã§ã¯ãéå»ã«åãããã·ã¥å€ãæã€ãã¡ã€ã«ããã£ããã©ããã調ã¹ãŠããããªããäžèŽãããã¡ã€ã«ããªããã°ã(ïŒ6)ã®éšåã§ãèŸæžåã®å€æ°body_dictã«ä»åã®ããã·ã¥å€ãšãã¡ã€ã«åãèšæ¶ããã
ãããŠãããã°ã©ã ã®(ïŒ5)ã®éšåã§ã¯ãå³å¯ã«ãã¡ã€ã«ã®å 容ãäžèŽããŠãããã©ããã調ã¹ãŠããããšããã®ã¯ãéåžžã«åžã ããç°ãªãããŒã¿ã§ããåãããã·ã¥å€ãæã€ããšãããã®ã§ããã¡ã€ã«å šäœãæ¯èŒããŠãæ¬åœã«åèŽããŠãããã確ãããŠããã
æåŸã«ãããã°ã©ã ãå®è¡ããŠã¿ãŠãåé¡ãªãããã§ããã°ã(ïŒ5a)ã®äžã«ããã³ã¡ã³ãæãè§£é€ããŠå®è¡ããŠèŠããããããšãéè€ãã¡ã€ã«ãå®éã«åé€ãããã
ãŸãšã
以äžãä»åã¯ãPythonãçšããŠãéè€ãã¡ã€ã«ãåé€ããããã°ã©ã ãäºã€äœã£ãŠã¿ããåè ã®ããã°ã©ã ã¯ãå®è¡ã«æéãããããã®ã®ãã¡ã¢ãªã¯ããã»ã©æ¶è²»ããªãã¿ã€ãã§ãåŸè ã¯ãå€å°ã¡ã¢ãªã¯æ¶è²»ãããã®ã®é«éã«åŠçãå®äºããã¿ã€ãã ãããã°ã©ã ãã©ã®ããã«äœããã«ãã£ãŠãããã°ã©ã ã®é床ãã¡ã¢ãªäœ¿çšéãå€ãã£ãŠãããå®éã«ããã°ã©ã ãå®è¡ããŠããã®åãã確èªãããªããçè§£ãæ·±ãŸãã®ã§ã詊ããŠã¿ããã
èªç±åããã°ã©ããŒããããã¯ãã©ã«ãŠãããã°ã©ãã³ã°ã®æ¥œãããäŒããæŽ»åãããŠããã代衚äœã«ãæ¥æ¬èªããã°ã©ãã³ã°èšèªããªã§ããã ãããã¹ã鳿¥œããµã¯ã©ããªã©ã2001幎ãªã³ã©ã€ã³ãœãã倧è³å ¥è³ã2004幎床æªèžãŠãŒã¹ ã¹ãŒããŒã¯ãªãšãŒã¿èªå®ã2010幎 OSSè²¢ç®è ç« åè³ãæè¡æžãå€ãå·çããŠããã



