Marc-Andre Lemburg <m...@egenix.com> added the comment:

Marc-Andre Lemburg wrote:
> 
> Marc-Andre Lemburg <m...@egenix.com> added the comment:
> 
> STINNER Victor wrote:
>>
>> STINNER Victor <victor.stin...@haypocalc.com> added the comment:
>>
>> I created a TAR archive with the 7-zip archiver of file with diacritics in 
>> their name (eg. "é" and "à"). Then I opened the archive with WinRAR: the 
>> file names were not displayed correctly :-/
>>
>> 7-zip encodes "à" (U+00e0) as 0x85 (1 byte), and "é" (U+00e9) as 0x82 (1 
>> byte). I don't know this encoding.
> 
> That's an old DOS code paged used in Europe: CP850
> 
> http://en.wikipedia.org/wiki/Code_page_850

Looks like the cmd.exe on WinXP still uses it. At least on my German
WinXP it does for Python 2.3 and older. Starting with Python 2.4,
the behavior changed to use CP1252 instead:

D:\Python26>python
Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on wi
32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'àé'
u'\xe0\xe9'

D:\Python25>python
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'áé'
u'\xe1\xe9'

D:\Python24>python
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'àé'
u'\xe0\xe9'

D:\Python23>python
Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'àé'
u'\x85\x82'
>>>

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8784>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to