Gregory P. Smith <g...@krypto.org> added the comment:

Examining Lib/zipfile.py code, the existing code makes sense. Python's zipfile 
module produces modern zipfiles when writing by setting the utf-8 flag and 
storing the filename as utf-8 when it is not ASCII.  This is desirable for use 
with all normal zip implementations in the past 10-15 years.

When decoding a zipfile, if the utf-8 flag is not set, we assume cp437 per the 
pkware zip appnotes.txt "spec".  So our reading is correct as well, even for 
very old files.

This is being strict in what we produce an lenient in what we accept.  caveats? 
 yes:

If someone does need to produce zipfiles for use with ancient software that 
does not support utf-8, that also does not identify the unknown utf-8 flag as 
an error condition, it will interpret the name in a corrupt manner for 
non-ascii names.

Similarly, even if written with cp437 names (as PR 19335 would do), in old zip 
system implementations where the implementation blindly uses the users locale 
encoding instead of cp437, it will always see corrupt data in that scenario. 
(aka mojibake?)

These are not what I'd expect to be normal use cases. Do you have a common 
practical example of a need for this?

(The PR on issue28080 provides a way to _read_ legacy zip files that used a 
codec other than cp437 if you know what it was.)

---

https://www.loc.gov/preservation/digital/formats/fdd/fdd000354.shtml may also 
be of interest regarding the zip format.

----------
nosy: +gregory.p.smith

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40172>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to