On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote: > your case highlights another problem not mentioned in the original list > posted by Marc: comparison (and normalization). > > Some characters can be encoded in more than one way. For instance, "é" in > "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as > 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301 > (UTF-8: 0x65 0xcc 0x81).
That would be two distinct user names. Unless we have a widely available unicode library that can do this kind of normalization it is unlikely that our system utilities can take care of that. I'd like to put that responsibility on to the person who / the system that actually creates those user names. > If a keyboard input system provides the former > sequence of bytes, but the username is stored in the login infrastructure > using the latter sequence of bites, then a naive comparison will not find > the user "émollier" in the system. Currently adduser just takes the characters that come from the command line and encodes it into the byte stream that goes to useradd and library calls. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421