Hallo Helmut, > Hi, > i'm new here in this list. > > i'm developing a little program using an xml document. So far it's easy > going, but when parsing an xml document which contains the EURO symbol > ('€') then I get an error: > > UnicodeEncodeError: 'charmap' codec can't encode character u'\xa4' in > position 11834: character maps to <undefined>
first of all, unicode handling is a little bit difficult, when encountered the first time, but in the end, it really makes a lot of sense :-) Please read some python unicode tutorial like http://www.amk.ca/python/howto/unicode If you open up a python interactive prompt, you can do the following: >>> print u'\u20ac' € >>> u'\u20ac'.encode('utf-8') '\xe2\x82\xac' >>> u'\u20ac'.encode('iso-8859-15') '\xa4' >>> u'\u20ac'.encode('iso-8859-1') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac' in position 0: \u20ac is the unicode code point for the Euro sign, so u'\u20ac' is the unicode euro sign in python. The different encode calls translate the unicode into actual encodings. What you are seeing in your xml document is the iso-8859-15 encoded euro sign. As Diez already noted, you must make shure, that 1. the whole xml document is encoded in latin-15 and the encoding header reflects that or 2. make sure that the utf-8 encoded euro sign is in your xml document. Hope that makes sense Stephan -- http://mail.python.org/mailman/listinfo/python-list