On 1/4/2010 5:35 PM, wiso wrote:
I'm trying the fileinput module, and I like it, but I don't understand why
it's so slow... look:
from time import time
from fileinput import FileInput
file = ['r1_200907.log', 'r1_200908.log', 'r1_200909.log', 'r1_200910.log',
'r1_200911.log']
def f1():
n = 0
for f in file:
print "new file: %s" % f
ff = open(f)
for line in ff:
n += 1
ff.close()
return n
def f2():
f = FileInput(file)
for line in f:
if f.isfirstline(): print "new file: %s" % f.filename()
return f.lineno()
def f3(): # f2 simpler
f = FileInput(file)
for line in f:
pass
return f.lineno()
t = time(); f1(); print time()-t # 1.0
t = time(); f2(); print time()-t # 7.0 !!!
t = time(); f3(); print time()-t # 5.5
I'm using text files, there are 2563150 lines in total.
1. Timings should include platform and Python version.
2. fileinput executes a lot of Python code on top of the underlying file
methods.
Your n += 1 is inadequate as compensation.
Fileinput does at least the following for each line :
try:
line = self._buffer[self._bufindex]
except IndexError:
pass
else:
self._bufindex += 1
self._lineno += 1
self._filelineno += 1
That is 5 attribute accesses, an indexing, and 3 additions
3. You are welcome to read the Python source in
.../pythonxy/Lib/fileinput.py
4. Doc string for 3.1 version says
"Performance: this module is unfortunately one of the slower ways of
processing large numbers of input lines. Nevertheless, a significant
speed-up has been obtained by using readlines(bufsize) instead of
readline(). A new keyword argument, bufsize=N, is present on the
input() function and the FileInput() class to override the default
buffer size."
If your version has bufsize, try something larger than the default of
8*1024, say 1024*1024.
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list