Martin Panter added the comment: This bug was originally raised against Python 3.3, and the speed has improved a lot since then. Perhaps this bug can be closed as it is, or maybe people would like to consider my decomp-optim.patch which squeezes a bit more speed out. I don’t actually have a strong opinion either way.
Python 3.4 was apparently much faster than 3.3 courtesy of Issue 16034. In Python 3.5, all three decompression modules (LZMA, gzip and bzip) now use a BufferedReader internally, due to my work in Issue 23529. The modules delegate method calls to the internal BufferedReader, rather than returning an instance directly, for backwards compatibility. I found that bypassing the readline() delegation speeds things up significantly, and adding a custom “closed” property on the underlying raw reader class also helps. However, I did not think it would be wise to bypass the locking in the “bz2” module, I didn’t bypass BZ2File.readline() in the patch. Timing results and a test script I used to investigate different options below: lzma gzip bz2 ======= ======== ======== Unpatched 3.2 s 2.513 s 5.180 s Custom __iter__() 1.31 s 1.317 s 2.433 s __iter__() and closed 0.53 s* 0.543 s* 1.650 s closed change only 4.047 s* External BufferedReader 0.64 s 0.597 s 1.750 s Direct from BytesIO 0.33 s 0.370 s 1.280 s Command-line tool 0.063 s 0.053 s 0.993 s * Option implemented in decomp-optim.patch --- import lzma, io filename = "pacman.log.xz" # 256206 lines; 389 kB -> 13 MB # Basic case reader = lzma.LZMAFile(filename) # 3.2 s # Add __iter__() optimization def lzma_iter(self): self._check_can_read() return iter(self._buffer) lzma.LZMAFile.__iter__ = lzma_iter # 1.31 s # Add “closed” optimization def decompressor_closed(self): return self._decompressor is None import _compression _compression.DecompressReader.closed = property(decompressor_closed) # 0.53 s #~ # External BufferedReader baseline #~ reader = io.BufferedReader(lzma.LZMAFile(filename)) # 0.64 s #~ # Direct from BytesIO baseline #~ with open(filename, "rb") as file: #~ data = file.read() #~ reader = io.BytesIO(lzma.decompress(data)) # 0.33 s for line in reader: pass ---------- keywords: +patch versions: +Python 3.5, Python 3.6 -Python 3.4 Added file: http://bugs.python.org/file39586/decomp-optim.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18003> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com