On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote: > On 22/11/24 20:42, Étienne Mollier wrote: > > I tried to consider what it would take to have an émollier or an > > Émollier login, and there is one little blocker : I may have to > > login from environments or keyboards lacking the necessary i18n > > and l10n capabilities to transcribe the 'e' acute, let alone the > > uppercase 'e' acute. > > Dear Étienne, > > your case highlights another problem not mentioned in the original list > posted by Marc: comparison (and normalization). > > Some characters can be encoded in more than one way. For instance, "é" in > "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as > 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301 > (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the former > sequence of bytes, but the username is stored in the login infrastructure > using the latter sequence of bites, then a naive comparison will not find > the user "émollier" in the system. Unicode defines in Annex 15 a few > normalization forms as a way to work around this problem. But a correct use > of these normalization forms still requires coordination and standardization > among all programs accessing the data. > > Does POSIX (or other de-facto standards) prescribe a normalization form for > Unicode-/UTF-8-encoded usernames?
POSIX says "if you want your applications to be portable, do not use any funny characters in usernames": https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_409 3.409 User Name A string that is used to identify a user; see also 3.407 User Database. To be portable across systems conforming to POSIX.1-2024, the value is composed of characters from the portable filename character set. The <hyphen-minus> character should not be used as the first character of a portable user name. For people unfamiliar with POSIX terms, the portable filename character set is defined as: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265 The set of characters from which portable filenames are constructed. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ - The last three characters are the <period>, <underscore>, and <hyphen-minus> characters, respectively. G'luck, Peter -- Peter Pentchev r...@ringlet.net r...@debian.org pe...@morpheusly.com PGP key: https://www.ringlet.net/roam/roam.key.asc Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
signature.asc
Description: PGP signature