Gilles Ganault wrote: > Hello > > I'm trying to read pages from Amazon JP, whose web pages are > supposed to be encoded in ShiftJIS, and decode contents into Unicode > to keep Python happy: > > www.amazon.co.jp > <meta http-equiv="content-type" content="text/html; charset=Shift_JIS" > /> > > But this doesn't work: > > ====== > m = try.search(the_page) > if m: > #UnicodeEncodeError: 'charmap' codec can't encode characters in > position 49-55: character maps to <undefined> > title = m.group(1).decode('shift_jis').strip() > ======
There's something fishy going on: You're calling the decode method and get a UnicodeEncodeError. This means that you're calling the decode method on something that already *is* unicode. What does print type(m.group(1)) output? Servus, Walter -- http://mail.python.org/mailman/listinfo/python-list