On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote:
> On 22/11/24 20:42, Étienne Mollier wrote:
> > I tried to consider what it would take to have an émollier or an
> > Émollier login, and there is one little blocker : I may have to
> > login from environments or keyboards lacking the necessary i18n
> > and l10n capabilities to transcribe the 'e' acute, let alone the
> > uppercase 'e' acute.
> 
> Dear Étienne,
> 
> your case highlights another problem not mentioned in the original list
> posted by Marc: comparison (and normalization).
> 
> Some characters can be encoded in more than one way. For instance, "é" in
> "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as
> 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301
> (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the former
> sequence of bytes, but the username is stored in the login infrastructure
> using the latter sequence of bites, then a naive comparison will not find
> the user "émollier" in the system. Unicode defines in Annex 15 a few
> normalization forms as a way to work around this problem. But a correct use
> of these normalization forms still requires coordination and standardization
> among all programs accessing the data.
> 
> Does POSIX (or other de-facto standards) prescribe a normalization form for
> Unicode-/UTF-8-encoded usernames?

POSIX says "if you want your applications to be portable, do not use any
funny characters in usernames":

  
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_409

  3.409 User Name

  A string that is used to identify a user; see also 3.407 User Database.
  To be portable across systems conforming to POSIX.1-2024, the value is
  composed of characters from the portable filename character set.
  The <hyphen-minus> character should not be used as the first character
  of a portable user name.

For people unfamiliar with POSIX terms, the portable filename character
set is defined as:

  
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265

  The set of characters from which portable filenames are constructed.

  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  a b c d e f g h i j k l m n o p q r s t u v w x y z
  0 1 2 3 4 5 6 7 8 9 . _ -

  The last three characters are the <period>, <underscore>, and
  <hyphen-minus> characters, respectively.

G'luck,
Peter

-- 
Peter Pentchev  r...@ringlet.net r...@debian.org pe...@morpheusly.com
PGP key:        https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115  C354 651E EFB0 2527 DF13

Attachment: signature.asc
Description: PGP signature

Reply via email to