New submission from STINNER Victor <victor.stin...@haypocalc.com>:

When reading a tar archive, tarfile decodes fields using "replace" error 
handler by default. The result is that we loose informations if there is an 
undecodable character.

Since the PEP 383, undecodable filenames are stored using surrogates in 
Python3. I think that it's a good idea to use surrogates for tar, because it's 
a common problem to have undecodable data in a tar archive (see the unicode 
section of the tarfile documentation).

----------
components: Library (Lib), Unicode
files: tarfile_surrogates.patch
keywords: patch
messages: 103099
nosy: haypo, loewis
severity: normal
status: open
title: tarfile: use surrogates for undecode fields
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file16917/tarfile_surrogates.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8390>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to