[issue8390] tarfile: use surrogates for undecode fields

2010-05-06 Thread STINNER Victor
Changes by STINNER Victor : -- status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue8390] tarfile: use surrogates for undecode fields

2010-05-05 Thread STINNER Victor
STINNER Victor added the comment: Thank you for your review. I commited the patch as r80824 (I fixed the documentation, :versionadded => :versionchanged), blocked as r80825 (3.2). -- > Unfortunately, POSIX says nothing about how to store bad filenames in > a pax archive. tarfile raises an err

[issue8390] tarfile: use surrogates for undecode fields

2010-05-05 Thread Lars Gustäbel
Lars Gustäbel added the comment: I think it is a good suggestion to use "surrogateescape" as the default, because (I hope) it produces the fewest errors and is the best choice if tarfile is used in connection with Python's filesystem calls. - When reading tar headers, undecodable chars in fil

[issue8390] tarfile: use surrogates for undecode fields

2010-05-03 Thread STINNER Victor
STINNER Victor added the comment: My patch changes test_uname_unicode() of test_tarfile for the GNU and ustar formats (but not PAX). In GNU and ustar formats, the fields can be encoded in any encoding, and may contain invalid byte sequences. -- ___

[issue8390] tarfile: use surrogates for undecode fields

2010-05-03 Thread Martin v . Löwis
Martin v. Löwis added the comment: I think it is helpful to read the pax specification here: http://www.opengroup.org/onlinepubs/009695399/utilities/pax.html pax defines (IIUC) that all strings in a pax-compliant tar file are UTF-8 encoded. For the "invalid" option, they offer the alternative

[issue8390] tarfile: use surrogates for undecode fields

2010-05-03 Thread STINNER Victor
STINNER Victor added the comment: A better fix is maybe to store fields as bytes, but it would break the compatibility and unicode is pratical in Python3. -- ___ Python tracker

[issue8390] tarfile: use surrogates for undecode fields

2010-05-03 Thread Lars Gustäbel
Lars Gustäbel added the comment: Yes, I will soon have ;-) Please give me a few days... -- ___ Python tracker ___ ___ Python-bugs-list

[issue8390] tarfile: use surrogates for undecode fields

2010-04-29 Thread STINNER Victor
STINNER Victor added the comment: lars: Do you have an opinion about this suggestion? -- ___ Python tracker ___ ___ Python-bugs-list m

[issue8390] tarfile: use surrogates for undecode fields

2010-04-23 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: +lars.gustaebel ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail

[issue8390] tarfile: use surrogates for undecode fields

2010-04-13 Thread STINNER Victor
New submission from STINNER Victor : When reading a tar archive, tarfile decodes fields using "replace" error handler by default. The result is that we loose informations if there is an undecodable character. Since the PEP 383, undecodable filenames are stored using surrogates in Python3. I t