Re: python encoding bug?

Vincent Wehren Sat, 31 Dec 2005 02:15:51 -0800

<[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
|
| I was playing with python encodings and noticed this:
|
| [EMAIL PROTECTED]:~$ python2.4
| Python 2.4 (#2, Dec  3 2004, 17:59:05)
| [GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
| Type "help", "copyright", "credits" or "license" for more information.
| >>> unicode('\x9d', 'iso8859_1')
| u'\x9d'
| >>>
|
| U+009D is NOT a valid unicode character (it is not even a iso8859_1
| valid character)


That statement is not entirely true. If you check the current 
UnicodeData.txt (on http://www.unicode.org/Public/UNIDATA/)  you'll find:

009D;<control>;Cc;0;BN;;;;;N;OPERATING SYSTEM COMMAND;;;;

Regards,

Vincent Wehren

|
| The same happens if I use 'latin-1' instead of 'iso8859_1'.
|
| This caught me by surprise, since I was doing some heuristics guessing
| string encodings, and 'iso8859_1' gave no errors even if the input
| encoding was different.
|
| Is this a known behaviour, or I discovered a terrible unknown bug in 
python encoding
| implementation that should be immediately reported and fixed? :-)
|
|
| happy new year,
|
| -- 
| -----------------------------------------------------------
|| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
|| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
| -----------------------------------------------------------
| Antivirus alert: file .signature infected by signature virus.
| Hi! I'm a signature virus! Copy me into your signature file to help me 
spread!

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python encoding bug?

Reply via email to