Nadeem Vawda <nadeem.va...@gmail.com> added the comment: Here is a quick-and-dirty reimplementation of BZ2File in Python, on top of the existing C implementation of BZ2Compressor and BZ2Decompressor.
There are a couple of issues with this code that need to be fixed: * BZ2Decompressor doesn't signal when it reaches the EOS marker, so doesn't seem possible to detect a premature end-of-file. This was easy in the C implementation, when using bzDecompress() directly. * The read*() methods are implemented very inefficiently. Since they have to deal with the bytes objects returned by BZ2Decompressor.decompress(), a large read results in lots of allocations that weren't necessary in the C implementation. I hope to resolve both of these issues (and do a general code cleanup), by writing a C extension module that provides a thin wrapper around bzCompress()/bzDecompress(), and reimplementing the module's public interface in Python on top of it. This should reduce the size of the code by close to half, and make it easier to read and maintain. I'm not sure when I'll be able to get around to it, though, so I thought I should post what I've done so far. Other changes in the patch: * write(), writelines() and seek() now return meaningful values instead of None, in line with the behaviour of other file-like objects. * Fixed a typo in test_bz2's testReadChunk10() that caused the test to pass regardless of whether the data read was correct (self.assertEqual(text, text) -> self.assertEqual(text, self.TEXT)). This one might be worth committing now, since it isn't dependent on the rewrite. ---------- Added file: http://bugs.python.org/file20521/bz2module-v2.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5863> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com