[issue45981] Get raw file name in bytes from ZipFile

Daniel Hillier Sun, 05 Dec 2021 17:28:10 -0800


Daniel Hillier <[email protected]> added the comment:


Handling different character sets is not completely supported yet. There are a 
couple of open issues relating to this: https://bugs.python.org/issue40407 
(reading file names), https://bugs.python.org/issue41928 (support for reading 
and writing filenames using the unicode filename extra field) and 
https://bugs.python.org/issue40172 (issues with reading and then writing a 
filename from and back into a zip where the initial filename isn't encoded in 
cp437).

Most modern zip programs that deal with characters outside ascii or cp437 
either set the utf-8 flag or write both an ascii or cp437 compatible filename 
(to the original filename field in the zip header) and the actual filename with 
all non-ascii characters in the unicode filename extra field. I think adding 
support for the unicode field to Python would probably cover the majority files 
generated by modern zip programs.

For complete support, including older zip programs that don't support the utf-8 
flag or unicode filename extra field, we may need to provide another parameter 
in Python's ZipFile's read and write functions to be able to override the 
charset used for the filename stored directly in the zip file header.

I've added my thoughts on how to approach this in 
https://bugs.python.org/issue40172 but haven't had time to implement these 
myself.

----------
nosy: +dhillier

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45981>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45981] Get raw file name in bytes from ZipFile

Reply via email to