I have a large ASCII data set that is zipped to a reasonable size. Can I access the data without decompressing the whole file first? I would like to run through the data to produce a much smaller extract and some summary statistics, but without unzipping it (if that is even possible).
Yes, if you're willing to slightly hack your install if you're running pre-2.6 Python.
I had the same question, and Gabriel suggested[2] I try dropping the 2.6 version of zipfile.py in my $PYTHONPATH so it's found before the existing version.
Once available, you can use the ZipFile.open() method which has an iterator you can use rather than reading the entire content into memory. You can read through the thread for further details.
Works on My Machine(tm)[3] -tkc [1] http://mail.python.org/pipermail/python-list/2007-December/469254.html [2] http://mail.python.org/pipermail/python-list/2007-December/469320.html [3] http://www.codinghorror.com/blog/archives/000818.html -- http://mail.python.org/mailman/listinfo/python-list