Hi nick (and Marc), At 2024-12-01T18:43:28-0500, nick black wrote: > Gioele Barabucci left as an exercise for the reader: > > You may have misunderstood that phrase. I was not referring to the > > fact that there are no standardized normalization forms for Unicode > > (I explicitly mention Annex 15 in [1]), but to the fact that there > > is no standard that specifies which of the possible normalization > > forms should be used for account names (and other fields in passwd). > > POSIX explicitly limits itself of a subset of ASCII, so it is not > > going to mandate any normalization form. Are there other standards > > (or initiatives) in this area that you know of? > > I'm glad we're both on page for Annex 15, and indeed, POSIX does seem > to explicitly exclude any work in this area. Assuming we're willing to > go beyond POSIX (and again, this seems something where we'd want to > loop in other distributions, and probably kernel developers), I'm > honestly not sure which of the Annex 15 canonicalizations we'd want to > use -- I'd like to hear from experts (or at least people with > extensive experience outside of US-ASCII) as to which method is best. > I have no dog in that hunt, save that everyone agrees on a method. > > It's for this reason that I think any work in this area needs be > encapsulated in a common library.
It sounds like you want something isomorphic, if not identical, to, Punycode. https://en.wikipedia.org/wiki/Punycode ...for which libraries exist, as I understand it. These things are ugly, which is why I suppose they haven't caught on despite being around for decades, but I would guess that this problem space is such that there are no non-ugly solutions apart from "just stick to ASCII", which some people find ugly in a different way. Apologies if I missed someone bringing up and rejecting Punycode in the previous ~41 messages in this thread. I rescanned, using my fallible human eyeballs. It would be helpful to me if lists.debian.org supported a search feature. Regards, Branden
signature.asc
Description: PGP signature