[issue15602] zipfile: wrong encoding charset of member filename

monson Wed, 08 Aug 2012 23:20:52 -0700

New submission from monson:

In /cpython/Lib/zipfile.py, there are some codes like


            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')


But actually there is no "Historical ZIP filename encoding", because zip files 
contain no charset info.
In English countries, it's usually not a big deal. But if the files zip on a 
non-cp437-based system (especially like China or Japan), filename is encoded 
from charsets like gb18030, but ZipFile decodes the byte stream to cp437, then 
everything goes wrong and people are hard to find the reason.

It's a problem new in py3k, and I found it on python3.2 and python3.4.
I suggest the filename returned in Bytes objects, or add decoding parameter 
when opening zipfile.

----------
components: Library (Lib)
messages: 167760
nosy: monson
priority: normal
severity: normal
status: open
title: zipfile: wrong encoding charset of member filename
type: behavior
versions: Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15602>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15602] zipfile: wrong encoding charset of member filename

Reply via email to