I am trying to write a python script that will compare 2 files which contains names (millions of them).
More specifically, I have 2 files (Files1.txt and Files2.txt). Files1.txtcontains 180 thousand names and Files2.txt contains 34 million names. I have a script which will analyze these two files and store them into 2 different lists (fileList1 and fileList2 respectivly). I have imported the diflib library and after the lists are created, matching on the following criteria " " for diflib -> (just the names that are similar between the two files). This works perfectly for hundreds of names but is taking forever for millions of them; thus not really efficient. Does anyone have any idea on how to get this more efficient? (speaking of Time and RAM) Any advice would be greatly appreciated. (NOTE: I have been trying to study multithreading, but have not really grasp the concept. So I may need some examples.) ~~~~~~~~~~~~~~ S.C.M.
-- http://mail.python.org/mailman/listinfo/python-list