New submission from Wolfgang Maier:

The current documentation of the gzip module should have its section "12.2.1. 
Examples of usage" updated to reflect the changes made to the module in 
Python3.2 (https://docs.python.org/3.2/whatsnew/3.2.html#gzip-and-zipfile).

Currently, the recipe given for gz-compressing a file is:

import gzip
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        f_out.writelines(f_in)

which is clearly sub-optimal because it is line-based.

An equally simple, but more efficient recipe would be:

chunk_size = 1024
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        while True:
            c = f_in.read(chunk_size)
            if not c: break
            d = f_out.write(c)

Comparing the two examples I find a >= 2x performance gain (both in terms of 
CPU time and wall time).

In the inverse scenario of file *de*-compression (which is not part of the docs 
though), the performance increase of substituting:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
    with open('/home/joe/file.txt', 'wb') as f_out:
        f_out.writelines(f_in)

with:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
    with open('/home/joe/file.txt', 'wb') as f_out:
        while True:
            c = f_in.read(chunk_size)
            if not c: break
            d = f_out.write(c)

is even higher (4-5x speed-ups).

In the de-compression case, another >= 2x speed-up can be achieved by avoiding 
the gzip module completely and going through a zlib.decompressobj instead, but 
of course this is a bit more complicated and should be documented in the zlib 
docs rather than the gzip docs (if you're interested, I could provide my code 
for it though).
Using the zlib library compression/decompression speed gets comparable to linux 
gzip/gunzip.

----------
assignee: docs@python
components: Documentation
messages: 215440
nosy: docs@python, wolma
priority: normal
severity: normal
status: open
title: update gzip usage examples in docs
type: performance
versions: Python 3.2, Python 3.3, Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21146>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to