difflib qualm

Sick Monkey Wed, 24 Jan 2007 18:06:38 -0800

I am trying to write a python script that will compare 2 files which
contains names (millions of them).


More specifically, I have 2 files (Files1.txt and Files2.txt).
Files1.txtcontains 180 thousand names and
Files2.txt contains 34 million names.

I have a script which will analyze these two files and store them into 2
different lists (fileList1 and fileList2 respectivly).  I have imported the
diflib library and after the lists are created, matching on the following
criteria " " for diflib -> (just the names that are similar between the two
files).

This works perfectly for hundreds of names but is taking forever for
millions of them; thus not really efficient.

Does anyone have any idea on how to get this more efficient?  (speaking of
Time and RAM)

Any advice would be greatly appreciated.   (NOTE:  I have been trying to
study multithreading, but have not really grasp the concept.  So I may need
some examples.)

~~~~~~~~~~~~~~
S.C.M.

-- 
http://mail.python.org/mailman/listinfo/python-list

difflib qualm

Reply via email to