Re: [Tutor] UnicodeEncodeError

Kent Johnson Wed, 25 Nov 2009 08:58:14 -0800

On Wed, Nov 25, 2009 at 8:44 AM, Albert-Jan Roskam <[email protected]> wrote:


> Hi,
>
> I'm parsing an xml file using elementtree, but it seems to get stuck on
> certain non-ascii characters (for example: "ê"). I'm using Python 2.4.
> Here's the relevant code fragment:
>
> # CODE:
> for element in doc.getiterator():
>   try:
>     m = re.match(search_text, str(element.text))
>   except UnicodeEncodeError:
>     raise # I want to get rid of this exception.
> # PRINTBACK:
>     m = re.match(search_text, str(element.text))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in
> position 4: ordinal not in range(128)
>

You can't convert element.text to a str because it contains non-ascii
characters. Why are you converting it? re.match() will accept a unicode
string as its argument.

>
> How can I get rid of this unicode encode error. I tried:
> s = str(element.text)
> s.encode("utf-8")
> (and then feeding it into the regex)
>

This fails because it is the str() that won't work. To get UTF-8 use
  s = element.text.encode('utf-8')
 but I don't think this is the correct solution.


> The xml file is in UTF-8. Somehow I need to tell the program not to use
> ascii but utf-8, right?
>
> No, just pass Unicode to re.match().

Kent

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UnicodeEncodeError

Reply via email to