Re: Problem with zipfile and newlines

2008-03-11 Thread neilcrighton
Sorry my initial post was muddled. Let me try again.

I've got a zipped archive that I can extract files from with my
standard archive unzipping program, 7-zip. I'd like to extract the
files in python via the zipfile module.  However, when I extract the
file from the archive with ZipFile.read(), it isn't the same as the 7-
zip-extracted file. For text files, the zipfile-extracted version has
'\r\n' everywhere the 7-zip-extracted file only has '\n'. I haven't
tried comparing binary files via the two extraction methods yet.

Regarding the code I posted; I was writing it from memory, and made a
mistake. I didn't use:

z = zipfile.ZipFile(open('foo.zip', 'r'))

I used this:

z = zipfile.ZipFile('foo.zip')

But Duncan's comment was useful, as I generally only ever work with
text files, and I didn't realise you have to use 'rb' or 'wb' options
when reading and writing binary files.

To answer John's questions - I was calling '\r' a newline. I should
have said carriage return. I'm not sure what operating system the
original zip file was created on. I didn't fiddle with the extracted
file contents, other than replacing '\r' with ''.  I wrote out all the
files with open('outputfile','w') - I seems that I should have been
using 'wb' when writing out the binary files.

Thanks for the quick responses - any ideas why the zipfile-extracted
files and 7-zip-extracted files are different?

On Mar 10, 9:37 pm, John Machin <[EMAIL PROTECTED]> wrote:
> On Mar 10, 11:14 pm, Duncan Booth <[EMAIL PROTECTED]>
> wrote:
>
>
>
> > "Neil Crighton" <[EMAIL PROTECTED]> wrote:
> > > I'm using the zipfile library to read a zip file in Windows, and it
> > > seems to be adding too many newlines to extracted files. I've found
> > > that for extracted text-encoded files, removing all instances of '\r'
> > > in the extracted file seems to fix the problem, but I can't find an
> > > easy solution for binary files.
>
> > > The code I'm using is something like:
>
> > > from zipfile import Zipfile
> > > z = Zipfile(open('zippedfile.zip'))
> > > extractedfile = z.read('filename_in_zippedfile')
>
> > > I'm using Python version 2.5.  Has anyone else had this problem
> > > before, or know how to fix it?
>
> > > Thanks,
>
> > Zip files aren't text. Try opening the zipfile file in binary mode:
>
> >open('zippedfile.zip', 'rb')
>
> Good pickup, but that indicates that the OP may have *TWO* problems,
> the first of which is not posting the code that was actually executed.
>
> If the OP actually executed the code that he posted, it is highly
> likely to have died in a hole long before it got to the z.read()
> stage, e.g.
>
> >>> import zipfile
> >>> z = zipfile.ZipFile(open('foo.zip'))
>
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "C:\python25\lib\zipfile.py", line 346, in __init__
> self._GetContents()
>   File "C:\python25\lib\zipfile.py", line 366, in _GetContents
> self._RealGetContents()
>   File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents
> centdir = struct.unpack(structCentralDir, centdir)
>   File "C:\python25\lib\struct.py", line 87, in unpack
> return o.unpack(s)
> struct.error: unpack requires a string argument of length 46
>
> >>> z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK
> >>> z = zipfile.ZipFile('foo.zip', 'r') # OK
>
> If it somehow made it through the open stage, it surely would have
> blown up at the read stage, when trying to decompress a contained
> file.
>
> Cheers,
> John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with zipfile and newlines

2008-03-11 Thread neilcrighton
I think I've worked it out after reading the 'Binary mode for files'
section of http://zephyrfalcon.org/labs/python_pitfalls.html

zipfile extracts as file as a binary series of characters, and I'm
writing out this binary file as a text file with open('foo','w').
Normally Python converts a '\n' in a text file to whatever the
platform-dependent indication of a new line is ('\n' on Unix, '\r\n'
on Windows, '\r' on Macs).  So it sees '\r\n' in the binary file and
converts it to '\r\r\n' for the text file.

The upshot of this is that writing out the zipfile-extracted files
with open('foo','wb') instead of open('foo','w') solves my problem.

On Mar 11, 8:43 pm, [EMAIL PROTECTED] wrote:
> Sorry my initial post was muddled. Let me try again.
>
> I've got a zipped archive that I can extract files from with my
> standard archive unzipping program, 7-zip. I'd like to extract the
> files in python via the zipfile module.  However, when I extract the
> file from the archive with ZipFile.read(), it isn't the same as the 7-
> zip-extracted file. For text files, the zipfile-extracted version has
> '\r\n' everywhere the 7-zip-extracted file only has '\n'. I haven't
> tried comparing binary files via the two extraction methods yet.
>
> Regarding the code I posted; I was writing it from memory, and made a
> mistake. I didn't use:
>
> z = zipfile.ZipFile(open('foo.zip', 'r'))
>
> I used this:
>
> z = zipfile.ZipFile('foo.zip')
>
> But Duncan's comment was useful, as I generally only ever work with
> text files, and I didn't realise you have to use 'rb' or 'wb' options
> when reading and writing binary files.
>
> To answer John's questions - I was calling '\r' a newline. I should
> have said carriage return. I'm not sure what operating system the
> original zip file was created on. I didn't fiddle with the extracted
> file contents, other than replacing '\r' with ''.  I wrote out all the
> files with open('outputfile','w') - I seems that I should have been
> using 'wb' when writing out the binary files.
>
> Thanks for the quick responses - any ideas why the zipfile-extracted
> files and 7-zip-extracted files are different?
>
> On Mar 10, 9:37 pm, John Machin <[EMAIL PROTECTED]> wrote:
>
> > On Mar 10, 11:14 pm, Duncan Booth <[EMAIL PROTECTED]>
> > wrote:
>
> > > "Neil Crighton" <[EMAIL PROTECTED]> wrote:
> > > > I'm using the zipfile library to read a zip file in Windows, and it
> > > > seems to be adding too many newlines to extracted files. I've found
> > > > that for extracted text-encoded files, removing all instances of '\r'
> > > > in the extracted file seems to fix the problem, but I can't find an
> > > > easy solution for binary files.
>
> > > > The code I'm using is something like:
>
> > > > from zipfile import Zipfile
> > > > z = Zipfile(open('zippedfile.zip'))
> > > > extractedfile = z.read('filename_in_zippedfile')
>
> > > > I'm using Python version 2.5.  Has anyone else had this problem
> > > > before, or know how to fix it?
>
> > > > Thanks,
>
> > > Zip files aren't text. Try opening the zipfile file in binary mode:
>
> > >open('zippedfile.zip', 'rb')
>
> > Good pickup, but that indicates that the OP may have *TWO* problems,
> > the first of which is not posting the code that was actually executed.
>
> > If the OP actually executed the code that he posted, it is highly
> > likely to have died in a hole long before it got to the z.read()
> > stage, e.g.
>
> > >>> import zipfile
> > >>> z = zipfile.ZipFile(open('foo.zip'))
>
> > Traceback (most recent call last):
> >   File "", line 1, in 
> >   File "C:\python25\lib\zipfile.py", line 346, in __init__
> > self._GetContents()
> >   File "C:\python25\lib\zipfile.py", line 366, in _GetContents
> > self._RealGetContents()
> >   File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents
> > centdir = struct.unpack(structCentralDir, centdir)
> >   File "C:\python25\lib\struct.py", line 87, in unpack
> > return o.unpack(s)
> > struct.error: unpack requires a string argument of length 46
>
> > >>> z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK
> > >>> z = zipfile.ZipFile('foo.zip', 'r') # OK
>
> > If it somehow made it through the open stage, it surely would have
> > blown up at the read stage, when trying to decompress a contained
> > file.
>
> > Cheers,
> > John

-- 
http://mail.python.org/mailman/listinfo/python-list