Ivan Sorokin <ivan.sorokin.t...@gmail.com> added the comment: Grand unified algorithm to read filenames from zip files correctly:
1. Do zip entry have «Unicode Path Extra Field» (0x7075)? Use it for file name. 2. Is Unicode flag (0x800) set in «Flags» Field of zip entry? Assume «Filename» Field is in UTF-8. 3. Do «HostOS» Field of zip entry have values of 0 (FAT) or 11 (NTFS)? Assume «Filename» Field is in OEM charset corresponding to system locale. 4. Assume «Filename» Field is in UTF-8. p7zip with oemcp patch (https://github.com/unxed/oemcp/) uses exactly this method, and is able to process all zip files in my test set correctly (my test set contains several zips generated by different packers on windows, macos, linux, and by online services). The same algorithm should be used in any zip unpacker wishing to process non-latin filenames as gently as possible. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41928> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com