<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] | | I was playing with python encodings and noticed this: | | [EMAIL PROTECTED]:~$ python2.4 | Python 2.4 (#2, Dec 3 2004, 17:59:05) | [GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2 | Type "help", "copyright", "credits" or "license" for more information. | >>> unicode('\x9d', 'iso8859_1') | u'\x9d' | >>> | | U+009D is NOT a valid unicode character (it is not even a iso8859_1 | valid character)
That statement is not entirely true. If you check the current UnicodeData.txt (on http://www.unicode.org/Public/UNIDATA/) you'll find: 009D;<control>;Cc;0;BN;;;;;N;OPERATING SYSTEM COMMAND;;;; Regards, Vincent Wehren | | The same happens if I use 'latin-1' instead of 'iso8859_1'. | | This caught me by surprise, since I was doing some heuristics guessing | string encodings, and 'iso8859_1' gave no errors even if the input | encoding was different. | | Is this a known behaviour, or I discovered a terrible unknown bug in python encoding | implementation that should be immediately reported and fixed? :-) | | | happy new year, | | -- | ----------------------------------------------------------- || Radovan GarabĂk http://kassiopeia.juls.savba.sk/~garabik/ | || __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk | | ----------------------------------------------------------- | Antivirus alert: file .signature infected by signature virus. | Hi! I'm a signature virus! Copy me into your signature file to help me spread!
-- http://mail.python.org/mailman/listinfo/python-list