Nick Coghlan added the comment:

I'd wondered about that with respect to rehandle_surrogatepass.

The current implementation looks like it processes *all* surrogates (even valid 
surrogate pairs), so "handle_surrogates" might be a suitable name.

If the intent is for it to be "handle_lone_surrogates", I'm not sure the 
current implementation achieves that, as a valid surrogate pair will match 
re.compile('[\ud800-\uefff]+').

The rest looks OK to me, including the decompose_astrals() and 
compose_surrogate_pairs() functions. Regardless of any practical utility, the 
latter two seem useful for *educational* purposes when it comes to unicode, by 
making it clear how to switch between the single code point and dual code point 
representations of the astrals.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to