Hi,
 
I'm parsing an xml file using elementtree, but it seems to get stuck on certain 
non-ascii characters (for example: "ê"). I'm using Python 2.4. Here's the 
relevant code fragment:
 
# CODE:
for element in doc.getiterator():
  try:
    m = re.match(search_text, str(element.text))
  except UnicodeEncodeError:
    raise # I want to get rid of this exception.

# PRINTBACK:
    m = re.match(search_text, str(element.text))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 4: 
ordinal not in range(128)
 
How can I get rid of this unicode encode error. I tried:
s = str(element.text)
s.encode("utf-8")
(and then feeding it into the regex)
 
The xml file is in UTF-8. Somehow I need to tell the program not to use ascii 
but utf-8, right?
 
Thanks in advance!

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


      
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to