josh logan <[EMAIL PROTECTED]> wrote: > >I am using Python 3.0b2. >I have an XML file that has the unicode character '\u012b' in it, >which, when parsed, causes a UnicodeEncodeError: > >'charmap' codec can't encode character '\u012b' in position 26: >character maps to <undefined> > >This happens even when I assign this character to a reference in the >interpreter: > >Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit >(Intel)] on > win32 >Type "help", "copyright", "credits" or "license" for more information. >>>> s = '\u012b' >>>> s >Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "C:\Python30\lib\io.py", line 1428, in write > b = encoder.encode(s) > File "C:\Python30\lib\encodings\cp437.py", line 19, in encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] >UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in >position >1: character maps to <undefined> > >Is this a known issue, or am I doing something wrong?
Both. U+012B is the Latin lower-case i with macron (i with a bar instead of a dot). That character does not exist in the 8-bit character set CP437, which you are trying to use. If you choose an 8-bit character set that includes i-with-macron, then it will work. UTF-8 would be a good choice. It's in ISO-8859-10. -- Tim Roberts, [EMAIL PROTECTED] Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list