I was playing with python encodings and noticed this:

[EMAIL PROTECTED]:~$ python2.4
Python 2.4 (#2, Dec  3 2004, 17:59:05)
[GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unicode('\x9d', 'iso8859_1')
u'\x9d'
>>>

U+009D is NOT a valid unicode character (it is not even a iso8859_1
valid character)

The same happens if I use 'latin-1' instead of 'iso8859_1'.

This caught me by surprise, since I was doing some heuristics guessing 
string encodings, and 'iso8859_1' gave no errors even if the input
encoding was different.

Is this a known behaviour, or I discovered a terrible unknown bug in python 
encoding
implementation that should be immediately reported and fixed? :-)


happy new year,

-- 
 -----------------------------------------------------------
| Radovan GarabĂ­k http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to