New submission from Wolfgang Maier: The current documentation of the gzip module should have its section "12.2.1. Examples of usage" updated to reflect the changes made to the module in Python3.2 (https://docs.python.org/3.2/whatsnew/3.2.html#gzip-and-zipfile).
Currently, the recipe given for gz-compressing a file is: import gzip with open('/home/joe/file.txt', 'rb') as f_in: with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out: f_out.writelines(f_in) which is clearly sub-optimal because it is line-based. An equally simple, but more efficient recipe would be: chunk_size = 1024 with open('/home/joe/file.txt', 'rb') as f_in: with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out: while True: c = f_in.read(chunk_size) if not c: break d = f_out.write(c) Comparing the two examples I find a >= 2x performance gain (both in terms of CPU time and wall time). In the inverse scenario of file *de*-compression (which is not part of the docs though), the performance increase of substituting: with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in: with open('/home/joe/file.txt', 'wb') as f_out: f_out.writelines(f_in) with: with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in: with open('/home/joe/file.txt', 'wb') as f_out: while True: c = f_in.read(chunk_size) if not c: break d = f_out.write(c) is even higher (4-5x speed-ups). In the de-compression case, another >= 2x speed-up can be achieved by avoiding the gzip module completely and going through a zlib.decompressobj instead, but of course this is a bit more complicated and should be documented in the zlib docs rather than the gzip docs (if you're interested, I could provide my code for it though). Using the zlib library compression/decompression speed gets comparable to linux gzip/gunzip. ---------- assignee: docs@python components: Documentation messages: 215440 nosy: docs@python, wolma priority: normal severity: normal status: open title: update gzip usage examples in docs type: performance versions: Python 3.2, Python 3.3, Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21146> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com