As in my previous post, TAG with its xml serializer fails when both
tag Name and Contents are unicode strings with non ascii chars.

What about this trivial solution in html.py?

def xml(self):
        """
        generates the xml for this component.
        """

        (fa, co) = self._xml()

        if not self.tag:
            return co

        if self.tag[-1:] == '/':
            # <tag [attributes] />
            #should encode this as well
            return '<%s%s />' % (self.tag[:-1], fa)

        # else: <tag [attributes]>  inner components xml </tag>
        #explicitly encoding self.tag
        return '<%s%s>%s</%s>' % ((self.tag).encode('utf-8'), fa, co,
(self.tag).encode('utf-8'))

In my test code this is ok but I do not know if it breaks something
else.

Another thing I noticed is that even HTML attributes like
u'some_non_ascii_chars'  breaks _validate()..but this is a story much
more complex than the TAG problem.

Reply via email to