Martin v. Löwis <mar...@v.loewis.de> added the comment: > 1) the current approach of having a dict with name -> intvalue doesn't work > anymore, and a name -> valuelist should be used instead; > 2) the reverse dict for this would have to use tuples as keys, but I'm not > sure how useful would that be (producing entities is not a common case, > especially "unusual" ones like these). > 3) The name -> char dict might still be useful, and can easily become a > name -> str dict in order to deal with the multichar entities; > > Since 1) is not backward-compatible the HTML5 entities should probably go in > a separate dict.
+1 for a separate dict; -1 for a value list. The right value type is 'str'; name2codepoint ought to be deprecated (it's a left-over from when the str type wasn't unicode in 2.x). As for the reverse mapping: I'd add a dictionary that is reverse to entitydefs (i.e. with str keys). That some keys then have two characters is no real issue: applications that want to use this dictionary can either ignore them, or follow the approach of always checking Unicode combining characters - I'd expect that all "second" characters are indeed combining. OTOH, it's easy enough to create an inverted dictionary yourself when you need it, and not every three-line function needs to be in the standard library. It might actually be more useful to compile the values into a regular expression which you can then use to find out whether characters can be escaped using entity references. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11113> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com