On Mar 14, 1:53 am, John Nagle <[EMAIL PROTECTED]> wrote: > John Machin wrote: > > On Mar 14, 5:38 am, John Nagle <[EMAIL PROTECTED]> wrote: > >> Just noticed, again, that getattr/setattr are ASCII-only, and don't > >> support > >> Unicode. > > >> SGMLlib blows up because of this when faced with a Unicode end tag: > > >> File "/usr/local/lib/python2.5/sgmllib.py", line 353, in > >> finish_endtag > >> method = getattr(self, 'end_' + tag) > >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' > >> in position 46: ordinal not in range(128) > > >> Should attributes be restricted to ASCII, or is this a bug? > > >> John Nagle > > > Identifiers are restricted -- see section 2.3 (Identifiers and > > keywords) of the Reference Manual. The restriction is in effect that > > they match r'[A-Za-z_][A-Za-z0-9_]*\Z'. Hence if you can't use > > obj.nonASCIIname in your code, it makes sense for the equivalent usage > > in setattr and getattr not to be available. > > > However other than forcing unicode to str, setattr and getattr seem > > not to care what you use: > > OK. It's really a bug in SGMLlib, then. SGMLlib lets you provide a > subclass with a function with a name such as "end_img", to be called > at the end of an "img" tag. The mechanism which implements this blows > up on any tag name that won't convert to "str", even when there are > no "end_" functions that could be relevant. > > It's easy to fix in SGMLlib. It's just necessary to change > > except AttributeError: > to > except AttributeError, UnicodeEncodeError: > > in four places. I suppose I'll have to submit a patch.
FWIW, the stated goal of sgmllib is to parse the subset of SGML that HTML uses. There are no non-ascii elements in HTML, so I'm not certain this would be considered a bug in sgmllib. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list