"Marc 'BlackJack' Rintsch" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Thu, 25 Oct 2007 17:15:36 -0400, Tim Arnold wrote: > >> Hi, I'm getting the by-now-familiar error: >> return codecs.charmap_decode(input,errors,decoding_map) >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in >> position >> 4615: ordinal not in range(128) >> >> the html file I'm working with is in utf-8, I open it with codecs, try to >> feed it to TidyHTMLTreeBuilder, but no luck. Here's my code: >> from elementtree import ElementTree as ET >> from elementtidy import TidyHTMLTreeBuilder >> >> fd = codecs.open(htmfile,encoding='utf-8') >> tidyTree = >> TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8') >> tidyTree.feed(fd.read()) >> self.tree = tidyTree.close() >> fd.close() >> >> what am I doing wrong? Thanks in advance. > > You feed decoded data to `TidyHTMLTreeBuilder`. As the `encoding` > argument suggests this class wants bytes not unicode. Decoding twice > doesn't work. > > Ciao, > Marc 'BlackJack' Rintsch
well now that you say it, it seems so obvious... some day I will get the hang of this encode/decode stuff. When I read about it, I'm fine, it makes sense, etc. maybe even a little boring. And then I write stuff like the above! Thanks to you and Diez for straightening me out. --Tim -- http://mail.python.org/mailman/listinfo/python-list