[writing this with my adduser hat on. I am also in touch with the maintainers of src:shadow and base-passwd]
Hi, recently, I have "taken over" the wiki page about UserAccounts and have put in some history and general thoughts about what Debian thinks about user names and name restrictions. https://wiki.debian.org/UserAccounts I fear that I have opened an especially nasty can of worms by beginning to do sanity checks in adduser and being pointed towards user name encoding in that process. Can you help me to bring some sense into this mess? I would like to hear your comments. Feel free to directly apply corrections to the wiki page. I am especially interested in having clear terminology regarding unicode codepoints, UTF-8, character strings and byte strings. It is vitally important to be consistent her to avoid making the mess even worse. For adduser's next release, I would like to discuss the following things: (1) Should Debian allow UTF-8 user names in the first place or should we restrict names for regular users to some us-ascii near set as well? (I think yes, we should) (2) If the answer to (1) is "allow UTF-8", should we also do that for system users? (I think no, we should not) (2a) Which UTF-8 subset / code point classes should we allow and which should we reject? (I don't have an opinion about that) (3) I think that 32 characters/bytes (it's the same if we don't allow UTF-8) is a good limitation for a system user name. But, should we increase that for regular user names? (I think yes) (4) If we decide to relax some of our current requirements, where are the borders between "normal" user name, one that requires --allow-bad-names and finally one that requires --allow-all-names? Wouldn't it be offensive to speakers of some languages that require --allow-bad-names for their special characters to be allowed on a user name? (no opinion here that would not break backwards compatibility) (5) Is it right to say "the user name in /etc/passwd is UTF-8 encoded" or should I better say "the user name in /etc/passwd can be UTF-8 encoded"? (6) Does it still make sense to give non-UTF-8-locales special handling (which one?), or can adduser safely assume that any non-ascii locale is UTF-8? Or must I check for locale and reject UTF-8 user names on non-UTF-8 locales? (I hope that we can safely assume UTF-8) (7) Do the general restrictions for both kinds of user names make sense? Going forward with this would mean to reject user names that we used to accept before. (I think we should come close to systemd's ideas) (8) I think that our current way to restrict system account names is fine. Any objections/additions here? (9) Should some of this language be in Policy instead of some random wiki page? Policy is quite short about user names (chapter 9.2) (I think yes) (10) What should adduser do regarding subuids? Since I was ignorant about that concept until a few hours ago, all accounts created by adduser do have subuids, regardless of being system account or not, while useradd does not give system accounts subuids. Greetings Marc P.S.: The teams and inviduals working on src:shadow, base-passwd and adduser would appreciate your help in coding and packaging. You can gt in touch with all involved parties via pkg-shadow-de...@lists.alioth.debian.org -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421