[issue38861] zipfile: Corrupts filenames containing non-UTF8 characters

2019-11-19 Thread John Goerzen
New submission from John Goerzen : The zipfile.py standard library component contains a number of pieces of questionable handling of non-UTF8 filenames. As the ZIP file format predated Unicode by a significant number of years, this is actually fairly common with older code. Here is a very

[issue38864] dbm: Can't open database with bytes-encoded filename

2019-11-20 Thread John Goerzen
New submission from John Goerzen : This simple recipe fails: >>> import dbm >>> dbm.open(b"foo") Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.7/dbm/__init__.py", line 78, in open result = whichdb(fil

[issue38864] dbm: Can't open database with bytes-encoded filename

2019-11-20 Thread John Goerzen
John Goerzen added the comment: As has been pointed out to me, the surrogateescape method could be used here; however, it is a bit of an odd duckling itself, and the system's open() call accepts bytes; couldn't this as well? -- ___ Pyth

[issue38861] zipfile: Corrupts filenames containing non-UTF8 characters

2019-11-24 Thread John Goerzen
John Goerzen added the comment: I can tell you that the zip(1) on Unix systems has never done re-encoding to cp437; on a system that uses latin-1 (or any other latin-* for that matter) the filenames in the ZIP will be encoded in latin-1. Furthermore, this doesn't explain the corru

[issue38861] zipfile: Corrupts filenames containing non-UTF8 characters

2019-11-25 Thread John Goerzen
John Goerzen added the comment: Hi Jon, I've read your article in the gist, the ZIP spec, and the article you linked to. As the article you linked to (https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/) states, "Implementers just encode file names however they wan