On Fri, Jul 14, 2017, at 08:33, Chris Angelico wrote: > What do you mean about regular expressions? You can use REs with > normalized strings. And if you have any valid definition of "real > character", you can use it equally on an NFC-normalized or > NFD-normalized string than any other. They're just strings, you know.
I don't understand how normalization is supposed to help with this. It's not like there aren't valid combinations that do not have a corresponding single NFC codepoint (to say nothing of the situation with e.g. Indic languages). In principle probably a viable solution for regex would be to add character classes for base and combining characters, and then "[[:base:]][[:combining:]]*" can be used as a building block if necessary. -- https://mail.python.org/mailman/listinfo/python-list