Re: Musings about Usernames in adduser and Debian

Gioele Barabucci Fri, 22 Nov 2024 13:01:51 -0800

On 22/11/24 20:42, Étienne Mollier wrote:

I tried to consider what it would take to have an émollier or an
Émollier login, and there is one little blocker : I may have to
login from environments or keyboards lacking the necessary i18n
and l10n capabilities to transcribe the 'e' acute, let alone the
uppercase 'e' acute.


Dear Étienne,

your case highlights another problem not mentioned in the original listposted by Marc: comparison (and normalization).

Some characters can be encoded in more than one way. For instance, "é"in "émollier" could we stored as "e with acute" U+00E9 (and encoded inUTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plusU+0301 (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides theformer sequence of bytes, but the username is stored in the logininfrastructure using the latter sequence of bites, then a naivecomparison will not find the user "émollier" in the system. Unicodedefines in Annex 15 a few normalization forms as a way to work aroundthis problem. But a correct use of these normalization forms stillrequires coordination and standardization among all programs accessingthe data.

Does POSIX (or other de-facto standards) prescribe a normalization formfor Unicode-/UTF-8-encoded usernames?


Regards,

--
Gioele Barabucci

Re: Musings about Usernames in adduser and Debian

Reply via email to