On 22/11/24 20:42, Étienne Mollier wrote:
I tried to consider what it would take to have an émollier or an
Émollier login, and there is one little blocker : I may have to
login from environments or keyboards lacking the necessary i18n
and l10n capabilities to transcribe the 'e' acute, let alone the
uppercase 'e' acute.

Dear Étienne,

your case highlights another problem not mentioned in the original list posted by Marc: comparison (and normalization).

Some characters can be encoded in more than one way. For instance, "é" in "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301 (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the former sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of bites, then a naive comparison will not find the user "émollier" in the system. Unicode defines in Annex 15 a few normalization forms as a way to work around this problem. But a correct use of these normalization forms still requires coordination and standardization among all programs accessing the data.

Does POSIX (or other de-facto standards) prescribe a normalization form for Unicode-/UTF-8-encoded usernames?

Regards,

--
Gioele Barabucci

Reply via email to