Chris Angelico <ros...@gmail.com>: > On Wed, Jul 19, 2017 at 3:01 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: >> Yes. Also, not every letter can be normalized to a single codepoint so >> NFC is not a way out. For example, >> >> re.match("^[q̈]$", "q̈") >> >> returns None regardless of normalization. > > In what language or context would you actually want to do this?
I could have picked more realistic examples: Classic Greek or Hebrew, for example. However, someone might actually use even "q̈" in a real setting. First of all, it *is* a legal character. Secondly, people sometimes combine characters in an ad-hoc fashion. Thirdly, remember the case of Esperanto, which blessed the world with the letters ĉ ĝ ĥ ĵ ŝ ŭ Esperanto's venerable history finally awarded those characters a code-point status in Unicode. However, around the year 2000, it was still commonplace to use all sorts of tricks to type them on the Internet: ch gh hh jj sh u ^c ^g ^h ^j ^s ^u cx gx hx jx sx ux For all we know, someone somewhere might be cooking up a language that depends on "q̈". Marko -- https://mail.python.org/mailman/listinfo/python-list