Hi, Chris. I made a trivial testing framework for this cute problem and tried a couple of modifications. I also added the 10% of non-ELEMENT lines you mentioned. First thing, your updated algorithm didn't really get me much faster results than the original. I guess that my disk array sort of hides the multiple write penalty. But I experimented with various algorithms. Here's the code in its entirety: http://www.rafb.net/paste/results/ZuW4fK85.html My results (Python 2.4, 32bit Fedora Core) were:
[EMAIL PROTECTED] tmp]# python test.py Preparing data... [write_data1] Preparing output file... [write_data1] Writing... [write_data1] Done in 10.73 seconds. [write_data4] Preparing output file... [write_data4] Writing... [write_data4] Done in 10.46 seconds. [write_data_flush] Preparing output file... [write_data_flush] Writing... [write_data_flush] Done in 9.09 seconds. [write_data_per_line] Preparing output file... [write_data_per_line] Writing... [write_data_per_line] Done in 9.71 seconds. [write_data_once] Preparing output file... [write_data_once] Writing... [write_data_once] Done in 7.82 seconds. I'm pretty sure that your measures will vary (observing your results you seem to have a faster CPU but slower disk(s)). But you can just take what works best for you. I'm also quite confident that you won't be able to catch up C since as you can see Python's data structures are far more flexible and thus require more processing overhead. Regards, Łukasz -- http://mail.python.org/mailman/listinfo/python-list