As in my previous post, TAG with its xml serializer fails when both tag Name and Contents are unicode strings with non ascii chars.
What about this trivial solution in html.py? def xml(self): """ generates the xml for this component. """ (fa, co) = self._xml() if not self.tag: return co if self.tag[-1:] == '/': # <tag [attributes] /> #should encode this as well return '<%s%s />' % (self.tag[:-1], fa) # else: <tag [attributes]> inner components xml </tag> #explicitly encoding self.tag return '<%s%s>%s</%s>' % ((self.tag).encode('utf-8'), fa, co, (self.tag).encode('utf-8')) In my test code this is ok but I do not know if it breaks something else. Another thing I noticed is that even HTML attributes like u'some_non_ascii_chars' breaks _validate()..but this is a story much more complex than the TAG problem.