[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-26 Thread Roundup Robot
Roundup Robot added the comment: New changeset dc1045d08bd8 by Jason R. Coombs in branch '2.7': Issue #11638: Adding test to ensure .tar.gz files can be generated by sdist command with unicode metadata, based on David Barnett's patch. http://hg.python.org/cpython/rev/dc1045d08bd8 -- _

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Terry J. Reedy
Terry J. Reedy added the comment: I just took a look as the 3.2 tarfile code and see that it always (because self.name is always unicode) does the same encoding, with 'replace', referencing RFC1952. Although there are a few other differences, they appear inconsequential, so that the code othe

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Terry J. Reedy
Terry J. Reedy added the comment: As I understand the patched code, it only fixes the issue for unicode names that can be latin-1 encoded and that other unicode names will raise the same exception with 'latin-1' (or equivalent) substituted for 'ascii'. So it is easy for me to anticipate a new

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Jason R. Coombs
Jason R. Coombs added the comment: I also feel (1) or (3) is best for this issue. If there is a _better_ implementation, it should be reserved for a separate improvement to Python 3.2+. I lean slightly toward (3) because it would support filenames with Unicode characters other than latin-1 (

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Lars Gustäbel
Lars Gustäbel added the comment: I think we should wrap this up as soon as possible, because it has already absorbed too much of our time. The issue we discuss here is a tiny glitch triggered by a corner-case. My original idea was to fix it in a minimal sort of way that is backwards-compatibl

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-24 Thread Terry J. Reedy
Terry J. Reedy added the comment: With that explanation, that it is one case out of six that fails, for whatever reason, I agree. That leaves the issue of whether the fix is the right one. I currently agree with Victor that we should do what the rest of Python does and what is most universal

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-24 Thread Lars Gustäbel
Lars Gustäbel added the comment: I thought about that myself, too. It is clearly no new feature, it is really more some kind of a fix. Unicode pathnames given to tarfile.open() are just passed through to the open() function, which is why this always has been working, except for this particula

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-23 Thread Terry J. Reedy
Terry J. Reedy added the comment: 2.7 is closed to new features. This looks like it mignt be one. The 2.7 doc for tarfile.open says "Return a TarFile object for the pathname name." Does the meaning of 'pathname' in 2.7 generally include unicode as well as str objects? (It is not in the Glossa

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: "The gzip format (defined in RFC 1952) allows storing the original filename (without the .gz suffix) in an additional field in the header (the FNAME field). Latin-1 (iso-8859-1) is required." Hum, it looks like the author of the gzip program (on Linux Fedora

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Lars Gustäbel
Lars Gustäbel added the comment: See http://bugs.python.org/issue11638#msg150029 -- ___ Python tracker ___ ___ Python-bugs-list maili

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread STINNER Victor
STINNER Victor added the comment: + self.name = self.name.encode("iso-8859-1", "replace") Why did you chose ISO-8859-1? I think that the filesystem encoding should be used instead: -self.name = self.name.encode("iso-8859-1", "replace") +self.name = self.name.encode(ENC

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Roundup Robot
Roundup Robot added the comment: New changeset a60a3610a97b by Lars Gustäbel in branch '2.7': Issue #13639: Accept unicode filenames in tarfile.open(mode="w|gz"). http://hg.python.org/cpython/rev/a60a3610a97b -- nosy: +python-dev ___ Python tracker

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Jason R. Coombs
Jason R. Coombs added the comment: That looks like a good patch to me. Do you want to commit it, or would you rather I do? -- ___ Python tracker ___ ___

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Lars Gustäbel
Lars Gustäbel added the comment: tarfile under Python 2.x is not particularly designed to support unicode filenames (the gzip module does not support them either), but that should not be too hard to fix. -- keywords: +patch Added file: http://bugs.python.org/file24066/tarfile-stream-

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-20 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +lars.gustaebel ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-19 Thread Jason R. Coombs
New submission from Jason R. Coombs : python -c "import tarfile; tarfile.open(u'hello.tar.gz', 'w|gz')" produces Traceback (most recent call last): File "", line 1, in File "C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py", line 1687, in open _Stream(name, filemode, comptype, f