Gioele Barabucci left as an exercise for the reader: > You may have misunderstood that phrase. I was not referring to the fact that > there are no standardized normalization forms for Unicode (I explicitly > mention Annex 15 in [1]), but to the fact that there is no standard that > specifies which of the possible normalization forms should be used for > account names (and other fields in passwd). > POSIX explicitly limits itself of a subset of ASCII, so it is not going to > mandate any normalization form. Are there other standards (or initiatives) > in this area that you know of?
I'm glad we're both on page for Annex 15, and indeed, POSIX does seem to explicitly exclude any work in this area. Assuming we're willing to go beyond POSIX (and again, this seems something where we'd want to loop in other distributions, and probably kernel developers), I'm honestly not sure which of the Annex 15 canonicalizations we'd want to use -- I'd like to hear from experts (or at least people with extensive experience outside of US-ASCII) as to which method is best. I have no dog in that hunt, save that everyone agrees on a method. It's for this reason that I think any work in this area needs be encapsulated in a common library. -- nick black -=- https://nick-black.com to make an apple pie from scratch, you need first invent a universe.