Hi, I'm getting the by-now-familiar error: return codecs.charmap_decode(input,errors,decoding_map) UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 4615: ordinal not in range(128)
the html file I'm working with is in utf-8, I open it with codecs, try to feed it to TidyHTMLTreeBuilder, but no luck. Here's my code: from elementtree import ElementTree as ET from elementtidy import TidyHTMLTreeBuilder fd = codecs.open(htmfile,encoding='utf-8') tidyTree = TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8') tidyTree.feed(fd.read()) self.tree = tidyTree.close() fd.close() what am I doing wrong? Thanks in advance. On a related note, I have another question--where/how can I get the cElementTree.py module? Sorry for something so basic, but I tried installing cElementTree, but while I could compile with setup.py build, I didn't end up with a cElementTree.py file anywhere. The directory structure on my system (HPux, but no root access) doesn't work well with setup.py install. thanks, --Tim Arnold -- http://mail.python.org/mailman/listinfo/python-list