[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Charles-Francois Natali
Charles-Francois Natali added the comment: 2011/3/2 Eric Wolf : > > Eric Wolf added the comment: > > I just got confirmation that OSM is using pbzip2 to generate these files. So > they are multi-stream. At least that gives a final answer but doesn't solve > my problem. > At least on Unix, yo

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Charles-Francois Natali
Charles-Francois Natali added the comment: > Antoine Pitrou added the comment: > > Perhaps your bz2 files are simply multi-stream files? The bz2 module > currently doesn't support them (it only decompresses the first stream); see > issue1625 for a patch. That explains why it was seeing an end-

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf
Eric Wolf added the comment: I just got confirmation that OSM is using pbzip2 to generate these files. So they are multi-stream. At least that gives a final answer but doesn't solve my problem. I saw this: http://bugs.python.org/issue1625 Does anyone know the current status of the patch supp

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Antoine Pitrou
Antoine Pitrou added the comment: Perhaps your bz2 files are simply multi-stream files? The bz2 module currently doesn't support them (it only decompresses the first stream); see issue1625 for a patch. I'm not an expert on this, but it seems you can do: $ bzip2 -tvvv foo.bz2 foo.bz2:

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Charles-Francois Natali
Charles-Francois Natali added the comment: > Stupid questions are always worth asking. I did check the MD5 sum earlier > and just checked it again (since I copied the file from one machine to > another): > > ebwolf@ubuntu:/opt$ md5sum /host/full-planet-110115-1800.osm.bz2 > 0e3f81ef0dd415d8f90f1

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf
Eric Wolf added the comment: The only problem with the theory that the file is corrupt is that at least three people have encountered exactly the same problem with three files: http://mail.python.org/pipermail/tutor/2010-June/076343.html Colin was using an OSM planet file from some time last

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf
Eric Wolf added the comment: Stupid questions are always worth asking. I did check the MD5 sum earlier and just checked it again (since I copied the file from one machine to another): ebwolf@ubuntu:/opt$ md5sum /host/full-planet-110115-1800.osm.bz2 0e3f81ef0dd415d8f90f1378666a400c /host/full

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Charles-Francois Natali
Charles-Francois Natali added the comment: After running this under gdb, it turns out that it's actually bzlib's bzRead that's returning a BZ_STREAM_END after only 900k bytes. So it confims what I've been suspecting, i.e. that the file is corrupt (I got the error at exactly the same offset as

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Charles-Francois Natali
Charles-Francois Natali added the comment: > I've attached the strace output. I was getting an error with the sbrk > parameter, so I left it out. Yeah, sbrk is not a syscall ;-) > Let me know if there's anything else I can provide. Stupid questions: - have you checked the file's md5sum ? - w

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf
Eric Wolf added the comment: I tried the change you suggested. It still fails but now at 572,320 bytes instead of 900,000. I'm not sure why the difference in bytes read. I'll explore this more in a bit. I also converted the BZ2 to GZ and used the gzip module. It's failing after reading 46628

[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Charles-Francois Natali
Charles-Francois Natali added the comment: @Eric.Wolf Could you try with this: # Read in anohter chunk of the file # NOTE: It's possible that an XML tag will be greater than buffsize # This will break in that situation -newb = self.fp.read(

[issue10900] bz2 module fails to uncompress large files

2011-02-28 Thread Eric Wolf
Eric Wolf added the comment: I'm experiencing the same thing. My script works perfectly on a 165MB file but fails after reading 900,000 bytes on a 22GB file. My script uses a buffered bz2file.read and is agnostic about end-of-lines. Opening with "rb" does not help. It is specifically written

[issue10900] bz2 module fails to uncompress large files

2011-01-13 Thread wrobell
wrobell added the comment: Forgot the mention the real amount of lines! bzip2 -dc < planet-110105.osm.bz2 | wc -l 2783595867 -- ___ Python tracker ___ _

[issue10900] bz2 module fails to uncompress large files

2011-01-13 Thread SilentGhost
Changes by SilentGhost : -- nosy: +niemeyer -gustavo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue10900] bz2 module fails to uncompress large files

2011-01-13 Thread SilentGhost
Changes by SilentGhost : -- nosy: +gustavo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue10900] bz2 module fails to uncompress large files

2011-01-13 Thread wrobell
New submission from wrobell : There is problem to uncompress large files with bz2 module. For example, please download 13GB OpenStreetMap file using following torrent http://osm-torrent.torres.voyager.hr/files/planet-latest.osm.bz2.torrent Try to count lines in the compressed file with command