New submission from James Dominy: bz2.BZ2File does not decompress a file (see attached) correctly. This file can be decompressed and compressed via stadard unix tools (bzip2 and bunzip2) without change.
Consider ... $ python Python 2.7.6 (default, Dec 7 2013, 22:49:16) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import bz2 >>> import hashlib >>> len(bz2.BZ2File("example-file.csv.bz2", "r", 0).read()) 900000 >>> hashlib.md5(bz2.BZ2File("example-file.csv.bz2", "r", 0).read()).hexdigest() 'e2d4ce212a040c879cb256f88c9faab9' >>> len(bz2.BZ2File("example-file.csv.bz2", "rb", 0).read()) 900000 >>> hashlib.md5(bz2.BZ2File("example-file.csv.bz2", "rb", 0).read()).hexdigest() 'e2d4ce212a040c879cb256f88c9faab9' >>> It looks like bz2 is not dealing with the second block. This is not the first file I've come across that has this problem, and initially I thought it was the file not the module. I've attached a copy of the file. I use gentoo on a 64bit intel core i5. ---------- components: IO files: example-file.csv.bz2 messages: 212250 nosy: James.Dominy priority: normal severity: normal status: open title: BZ2File does decompress some .bz2 files correctly versions: Python 2.7 Added file: http://bugs.python.org/file34230/example-file.csv.bz2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20781> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com