Ben Finney wrote: > What happens, then, when you make a smaller program that deals with only > one file? > > What happens when you make a smaller program that only reads the file, > and doesn't write any? Or a different program that only writes a file, > and doesn't read any? > > It's these sort of reductions that will help narrow down exactly what > the problem is. Do make sure that each example is also complete (i.e. > can be run as is by someone who uses only that code with no additions). >
The program reads one csv file of 9,293,271 lines. 869M wb.csv It creates set of files containing the same lines but where each output file in the set contains only those lines where the value of a particular column is the same, the number of output files will depend on the number of distinct values in that column In the example that results in 19 files 74M tt_11696870405.txt 94M tt_18762175493.txt 15M tt_28668070915.txt 12M tt_28673313795.txt 15M tt_28678556675.txt 11M tt_28683799555.txt 12M tt_28689042435.txt 15M tt_28694285315.txt 7.3M tt_28835845125.txt 6.8M tt_28842136581.txt 12M tt_28848428037.txt 11M tt_28853670917.txt 12M tt_28858913797.txt 15M tt_28864156677.txt 11M tt_28869399557.txt 11M tt_28874642437.txt 283M tt_31002203141.txt 259M tt_33335282691.txt 45 2010-03-19 17:00 tt_taskid.txt changing with open(filename, 'rU') as tabfile: to with codecs.open(filename, 'rU', 'utf-8', 'backslashreplace') as tabfile: and with open(outfile, 'wt') as out_part: to with codecs.open(outfile, 'w', 'utf-8') as out_part: causes a program that runs in 43 seconds to take 4 minutes to process the same data. In this particular case that is not very important, any unicode strings in the data are not worth troubling over and I have already spent more time satisfying curiousity that will ever be required to process the dataset in future. But I have another project in hand where not only is the unicode significant but the files are very much larger. Scale up the problem and the difference between 4 hours and 24 become a matter worth some attention. -- David Clark, MSc, PhD. UCL Centre for Publishing Gower Str London WCIE 6BT What sort of web animal are you? <https://www.bbc.co.uk/labuk/experiments/webbehaviour> -- http://mail.python.org/mailman/listinfo/python-list