Re: Musings about Usernames in adduser and Debian

Marc Haber Wed, 27 Nov 2024 08:36:19 -0800

On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote:
> your case highlights another problem not mentioned in the original list
> posted by Marc: comparison (and normalization).
> 
> Some characters can be encoded in more than one way. For instance, "é" in
> "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as
> 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301
> (UTF-8: 0x65 0xcc 0x81).


That would be two distinct user names. Unless we have a widely available
unicode library that can do this kind of normalization it is unlikely
that our system utilities can take care of that. I'd like to put that
responsibility on to the person who / the system that actually creates
those user names.

> If a keyboard input system provides the former
> sequence of bytes, but the username is stored in the login infrastructure
> using the latter sequence of bites, then a naive comparison will not find
> the user "émollier" in the system.

Currently adduser just takes the characters that come from the command
line and encodes it into the byte stream that goes to useradd and
library calls.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

Re: Musings about Usernames in adduser and Debian

Reply via email to