Gilles Ganault wrote:
> Hello
> 
>       I'm trying to read pages from Amazon JP, whose web pages are
> supposed to be encoded in ShiftJIS, and decode contents into Unicode
> to keep Python happy:
> 
> www.amazon.co.jp
> <meta http-equiv="content-type" content="text/html; charset=Shift_JIS"
> /> 
> 
> But this doesn't work:
> 
> ======
> m = try.search(the_page)
> if m:
>       #UnicodeEncodeError: 'charmap' codec can't encode characters in
> position 49-55: character maps to <undefined>         
>       title = m.group(1).decode('shift_jis').strip()
> ======

There's something fishy going on: You're calling the decode method and
get a UnicodeEncodeError. This means that you're calling the decode
method on something that already *is* unicode. What does

   print type(m.group(1))

output?

Servus,
   Walter

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to