New submission from STINNER Victor <victor.stin...@haypocalc.com>: When reading a tar archive, tarfile decodes fields using "replace" error handler by default. The result is that we loose informations if there is an undecodable character.
Since the PEP 383, undecodable filenames are stored using surrogates in Python3. I think that it's a good idea to use surrogates for tar, because it's a common problem to have undecodable data in a tar archive (see the unicode section of the tarfile documentation). ---------- components: Library (Lib), Unicode files: tarfile_surrogates.patch keywords: patch messages: 103099 nosy: haypo, loewis severity: normal status: open title: tarfile: use surrogates for undecode fields versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file16917/tarfile_surrogates.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8390> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com