Martin v. Löwis <mar...@v.loewis.de> added the comment:

>   1) the current approach of having a dict with name -> intvalue doesn't work 
> anymore, and a name -> valuelist should be used instead;
>   2) the reverse dict for this would have to use tuples as keys, but I'm not 
> sure how useful would that be (producing entities is not a common case, 
> especially "unusual" ones like these).
>   3) The name -> char dict might still be useful, and can easily become a 
> name -> str dict in order to deal with the multichar entities;
> 
> Since 1) is not backward-compatible the HTML5 entities should probably go in 
> a separate dict.

+1 for a separate dict; -1 for a value list. The right value type is
'str'; name2codepoint ought to be deprecated (it's a left-over from
when the str type wasn't unicode in 2.x).

As for the reverse mapping: I'd add a dictionary that is reverse to
entitydefs (i.e. with str keys). That some keys then have two characters
is no real issue: applications that want to use this dictionary can
either ignore them, or follow the approach of always checking
Unicode combining characters - I'd expect that all "second" characters
are indeed combining.

OTOH, it's easy enough to create an inverted dictionary yourself
when you need it, and not every three-line function needs to be
in the standard library. It might actually be more useful to compile
the values into a regular expression which you can then use to
find out whether characters can be escaped using entity references.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11113>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to