On 22/11/24 20:42, Étienne Mollier wrote:
I tried to consider what it would take to have an émollier or an
Émollier login, and there is one little blocker : I may have to
login from environments or keyboards lacking the necessary i18n
and l10n capabilities to transcribe the 'e' acute, let alone the
uppercase 'e' acute.
Dear Étienne,
your case highlights another problem not mentioned in the original list
posted by Marc: comparison (and normalization).
Some characters can be encoded in more than one way. For instance, "é"
in "émollier" could we stored as "e with acute" U+00E9 (and encoded in
UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus
U+0301 (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the
former sequence of bytes, but the username is stored in the login
infrastructure using the latter sequence of bites, then a naive
comparison will not find the user "émollier" in the system. Unicode
defines in Annex 15 a few normalization forms as a way to work around
this problem. But a correct use of these normalization forms still
requires coordination and standardization among all programs accessing
the data.
Does POSIX (or other de-facto standards) prescribe a normalization form
for Unicode-/UTF-8-encoded usernames?
Regards,
--
Gioele Barabucci