Martin Panter added the comment:
This bug was originally raised against Python 3.3, and the speed has improved a
lot since then. Perhaps this bug can be closed as it is, or maybe people would
like to consider my decomp-optim.patch which squeezes a bit more speed out. I
don’t actually have a strong opinion either way.
Python 3.4 was apparently much faster than 3.3 courtesy of Issue 16034. In
Python 3.5, all three decompression modules (LZMA, gzip and bzip) now use a
BufferedReader internally, due to my work in Issue 23529. The modules delegate
method calls to the internal BufferedReader, rather than returning an instance
directly, for backwards compatibility.
I found that bypassing the readline() delegation speeds things up
significantly, and adding a custom “closed” property on the underlying raw
reader class also helps. However, I did not think it would be wise to bypass
the locking in the “bz2” module, I didn’t bypass BZ2File.readline() in the
patch. Timing results and a test script I used to investigate different options
below:
lzma gzip bz2
======= ======== ========
Unpatched 3.2 s 2.513 s 5.180 s
Custom __iter__() 1.31 s 1.317 s 2.433 s
__iter__() and closed 0.53 s* 0.543 s* 1.650 s
closed change only 4.047 s*
External BufferedReader 0.64 s 0.597 s 1.750 s
Direct from BytesIO 0.33 s 0.370 s 1.280 s
Command-line tool 0.063 s 0.053 s 0.993 s
* Option implemented in decomp-optim.patch
---
import lzma, io
filename = "pacman.log.xz" # 256206 lines; 389 kB -> 13 MB
# Basic case
reader = lzma.LZMAFile(filename) # 3.2 s
# Add __iter__() optimization
def lzma_iter(self):
self._check_can_read()
return iter(self._buffer)
lzma.LZMAFile.__iter__ = lzma_iter # 1.31 s
# Add “closed” optimization
def decompressor_closed(self):
return self._decompressor is None
import _compression
_compression.DecompressReader.closed = property(decompressor_closed) # 0.53 s
#~ # External BufferedReader baseline
#~ reader = io.BufferedReader(lzma.LZMAFile(filename)) # 0.64 s
#~ # Direct from BytesIO baseline
#~ with open(filename, "rb") as file:
#~ data = file.read()
#~ reader = io.BytesIO(lzma.decompress(data)) # 0.33 s
for line in reader:
pass
----------
keywords: +patch
versions: +Python 3.5, Python 3.6 -Python 3.4
Added file: http://bugs.python.org/file39586/decomp-optim.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18003>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com