Antoine Pitrou added the comment:

I second Serhiy here. Wrapping the LZMAFile in a BufferedReader is the simple 
solution to the performance problem:

 ./python -m timeit -s "import lzma, io" "f=lzma.LZMAFile('words.xz', 'r')" 
"for line in f: pass"
10 loops, best of 3: 148 msec per loop

$ ./python -m timeit -s "import lzma, io" 
"f=io.BufferedReader(lzma.LZMAFile('words.xz', 'r'))" "for line in f: pass"
10 loops, best of 3: 44.3 msec per loop

$ time xzcat words.xz | wc -l

real    0m0.021s
user    0m0.016s
sys     0m0.004s

Perhaps the top-level should do the wrapping for you, though.
Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a 

$ ./python -m timeit -s "import lzma, io" "'words.xz', 'rt')" "for 
line in f: pass"
10 loops, best of 3: 51.1 msec per loop

nosy: +pitrou

Python tracker <>
Python-bugs-list mailing list

Reply via email to