New submission from John Goerzen :
The zipfile.py standard library component contains a number of pieces of
questionable handling of non-UTF8 filenames. As the ZIP file format predated
Unicode by a significant number of years, this is actually fairly common with
older code.
Here is a very
New submission from John Goerzen :
This simple recipe fails:
>>> import dbm
>>> dbm.open(b"foo")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.7/dbm/__init__.py", line 78, in open
result = whichdb(fil
John Goerzen added the comment:
As has been pointed out to me, the surrogateescape method could be used here;
however, it is a bit of an odd duckling itself, and the system's open() call
accepts bytes; couldn't this as well?
--
___
Pyth
John Goerzen added the comment:
I can tell you that the zip(1) on Unix systems has never done re-encoding to
cp437; on a system that uses latin-1 (or any other latin-* for that matter) the
filenames in the ZIP will be encoded in latin-1. Furthermore, this doesn't
explain the corru
John Goerzen added the comment:
Hi Jon,
I've read your article in the gist, the ZIP spec, and the article you linked
to. As the article you linked to
(https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/) states,
"Implementers just encode file names however they wan