I was playing with python encodings and noticed this: [EMAIL PROTECTED]:~$ python2.4 Python 2.4 (#2, Dec 3 2004, 17:59:05) [GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> unicode('\x9d', 'iso8859_1') u'\x9d' >>>
U+009D is NOT a valid unicode character (it is not even a iso8859_1 valid character) The same happens if I use 'latin-1' instead of 'iso8859_1'. This caught me by surprise, since I was doing some heuristics guessing string encodings, and 'iso8859_1' gave no errors even if the input encoding was different. Is this a known behaviour, or I discovered a terrible unknown bug in python encoding implementation that should be immediately reported and fixed? :-) happy new year, -- ----------------------------------------------------------- | Radovan GarabĂk http://kassiopeia.juls.savba.sk/~garabik/ | | __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread! -- http://mail.python.org/mailman/listinfo/python-list