En Wed, 07 Feb 2007 00:28:31 -0300, Sick Monkey <[EMAIL PROTECTED]> escribió:
> qualm after qualm. Before you read this, my OS is Linux, up2date, and > minimal RAM (512). And Python 2.3 or earlier, I presume, else you would have the builtin set type. > The files that my script needs to read in and interpret can contain > anywhere > from 5 million lines to 65 million lines > > I have attached 2 versions of code for you to analyze. > ================= > I am having issues with performance. > > Instance 1: dict_compare.py {which is attached} > Is awesome, in that I have read a file and stored it into a hash table, > but > if you run it, the program decides to stall after writing all of the > date. > <NOTE: once you receive the statement "finished comparing 2 lists." the > file has actually finished processing within 1 minute, but the script > continues to run for additional minutes (10 additional minutes actually). > <I dont know why> This version reads both files FULLY into memory; maybe the delay time you see, is the deallocation of those two huge lists. > Instance 2: dictNew.py > Runs great but it is a little slower than Instance 1 (dict_compare.py). > BUT > WHEN IT FINISHES, IT STOPS THE APPLICATION.... no additional > minutes..... > <NOTE: I was not yelling with the capitalization, but I am frustrated> This version processes both files one line at a time, so the memory requirements are a lot lower. I think it's a bit slower because the Set class is implemented in Python; set (Python 2.4) is a builtin type now. You could combine both versions: use the dict approach from version 1, and process one line at a time as in version 2. You can get the mails in both dictionaries like this: for key in dict1: if key in dict2: print key -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list