On Jan 17, 2011, at 7:39 AM, carlo wrote:
> 
> As in my previous post, TAG with its xml serializer fails when both
> tag Name and Contents are unicode strings with non ascii chars.
> 
> What about this trivial solution in html.py?

Makes sense to me, but I have to admit that I get a little confused by the mix 
of Python's handling of encodings and the logic of TAG. 

It does seem reasonable that we'd default to utf8.

> 
> def xml(self):
>        """
>        generates the xml for this component.
>        """
> 
>        (fa, co) = self._xml()
> 
>        if not self.tag:
>            return co
> 
>        if self.tag[-1:] == '/':
>            # <tag [attributes] />
>            #should encode this as well
>            return '<%s%s />' % (self.tag[:-1], fa)
> 
>        # else: <tag [attributes]>  inner components xml </tag>
>        #explicitly encoding self.tag
>        return '<%s%s>%s</%s>' % ((self.tag).encode('utf-8'), fa, co,
> (self.tag).encode('utf-8'))
> 
> In my test code this is ok but I do not know if it breaks something
> else.
> 
> Another thing I noticed is that even HTML attributes like
> u'some_non_ascii_chars'  breaks _validate()..but this is a story much
> more complex than the TAG problem.


Reply via email to