Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:

I experimented with this a lot. There is a problem with the append mode. We can 
read in the append mode, therefore we need an encoding. But when we close a 
ZipFile after appending, non-ASCII file names will be encoded in UTF-8 in the 
central directory. Next time when we open the archive for reading with 
different encoding we will get an error because filenames in the central 
directory and in local headers are different. We need to write non-ASCII files 
back with the specified encoding to get a self-consistent data.

Finally I left this as it was initially. We can return to the problem with the 
append module later.

The differences between PR 32007 and your patches:

* The parameter was renamed to metadata_encoding to avoid confusion with 
existing parameter of ZipFile.open() encoding. In future I am going to use it 
also for comments. The attribute and the CLI option were renamed 
correspondingly.
* --metadata-encoding can also be used with the -t option.
* "surrogateescape" no longer used. If the encoding in not suitable, you will 
get an error. Use the default and decode filenames manually in such cases. We 
can change this in future.
* Updated documentation.
* Tests were significantly rewritten. Now they test the behavior with wrong 
metadata_encoding, mixed UTF-8 and legacy encodings, and reading after append.

I was going to make more changes, but left it for future.

----------
versions: +Python 3.11 -Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue28080>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to