You've hit the bullseye! ;) Thanks a lot!!!
> Oh, I forgot to mention: you have a bug in this function. You're already > including the newline in the len(line), so there is no need to add one. > The result is that you only generate 44MB instead of 50MB. That's because I'm running on Windows. What's the fastest way to check if '\n' translates to 2 bytes on file? > Here are the results of profiling the above on my computer. Including the > overhead of the profiler, it takes just over 50 seconds to run your file > on my computer. > > [steve@ando ~]$ python -m cProfile fastwrite5.py > 17846645 function calls in 53.575 seconds > Didn't know the cProfile module.Thanks a lot! > Ordered by: standard name > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 30.561 30.561 53.575 53.575 fastwrite5.py:1(<module>) > 1 0.000 0.000 0.000 0.000 {cStringIO.StringIO} > 5948879 5.582 0.000 5.582 0.000 {len} > 1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects} > 1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects} > 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} > 5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects} > 1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objects} > 5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' > objects} > 1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects} > 1 0.000 0.000 0.000 0.000 {open} > > > As you can see, the time is dominated by repeatedly calling len(), > str.format() and StringIO.write() methods. Actually writing the data to > the file is quite a small percentage of the cumulative time. > > So, here's another version, this time using a pre-calculated limit. I > cheated and just copied the result from the fastwrite5 output :-) > > # fasterwrite.py > filename = 'fasterwrite.dat' > with open(filename, 'w') as f: > for i in xrange(5948879): # Actually only 44MB, not 50MB. > f.write('%d\n' % i) > I had the same idea but kept the original method because I didn't want to waste time creating a function for calculating the actual number of iterations needed to deliver 50MB of data. ;) > And the profile results are about twice as fast as fastwrite5 above, with > only 8 seconds in total writing to my HDD. > > [steve@ando ~]$ python -m cProfile fasterwrite.py > 5948882 function calls in 28.840 seconds > > Ordered by: standard name > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 20.592 20.592 28.840 28.840 fasterwrite.py:1(<module>) > 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} > 5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects} > 1 0.019 0.019 0.019 0.019 {open} > I thought there would be a call to format method by "'%d\n' % i". It seems the % operator is a lot faster than format. I just stopped using it because I read it was going to be deprecated. :( Why replace such a great and fast operator by a slow method? I mean, why format is been preferred over %? > Without the overhead of the profiler, it is a little faster: > > [steve@ando ~]$ time python fasterwrite.py > > real 0m16.187s > user 0m13.553s > sys 0m0.508s > > > Although it is still slower than the heavily optimized dd command, > but not unreasonably slow for a high-level language: > > [steve@ando ~]$ time dd if=fasterwrite.dat of=copy.dat > 90781+1 records in > 90781+1 records out > 46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s > > real 0m0.786s > user 0m0.071s > sys 0m0.595s > > > > > -- > Steven > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list