On Fri, 17 May 2013 18:20:33 +0300, Carlos Nepomuceno wrote: > ### fastwrite5.py ### > import cStringIO > size = 50*1024*1024 > value = 0 > filename = 'fastwrite5.dat' > x = 0 > b = cStringIO.StringIO() > while x < size: > line = '{0}\n'.format(value) > b.write(line) > value += 1 > x += len(line)+1
Oh, I forgot to mention: you have a bug in this function. You're already including the newline in the len(line), so there is no need to add one. The result is that you only generate 44MB instead of 50MB. > f = open(filename, 'w') > f.write(b.getvalue()) > f.close() > b.close() Here are the results of profiling the above on my computer. Including the overhead of the profiler, it takes just over 50 seconds to run your file on my computer. [steve@ando ~]$ python -m cProfile fastwrite5.py 17846645 function calls in 53.575 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 30.561 30.561 53.575 53.575 fastwrite5.py:1(<module>) 1 0.000 0.000 0.000 0.000 {cStringIO.StringIO} 5948879 5.582 0.000 5.582 0.000 {len} 1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects} 1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects} 1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objects} 5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' objects} 1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects} 1 0.000 0.000 0.000 0.000 {open} As you can see, the time is dominated by repeatedly calling len(), str.format() and StringIO.write() methods. Actually writing the data to the file is quite a small percentage of the cumulative time. So, here's another version, this time using a pre-calculated limit. I cheated and just copied the result from the fastwrite5 output :-) # fasterwrite.py filename = 'fasterwrite.dat' with open(filename, 'w') as f: for i in xrange(5948879): # Actually only 44MB, not 50MB. f.write('%d\n' % i) And the profile results are about twice as fast as fastwrite5 above, with only 8 seconds in total writing to my HDD. [steve@ando ~]$ python -m cProfile fasterwrite.py 5948882 function calls in 28.840 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 20.592 20.592 28.840 28.840 fasterwrite.py:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects} 1 0.019 0.019 0.019 0.019 {open} Without the overhead of the profiler, it is a little faster: [steve@ando ~]$ time python fasterwrite.py real 0m16.187s user 0m13.553s sys 0m0.508s Although it is still slower than the heavily optimized dd command, but not unreasonably slow for a high-level language: [steve@ando ~]$ time dd if=fasterwrite.dat of=copy.dat 90781+1 records in 90781+1 records out 46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s real 0m0.786s user 0m0.071s sys 0m0.595s -- Steven -- http://mail.python.org/mailman/listinfo/python-list