On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote: > Marc Haber left as an exercise for the reader: > > > * any upstream tool could say "bad idea" and refuse patches, > > > requiring their long term management, > > > > Depending of how important this tool is, we could get away without > > patching and probably not even documenting this failure. > > This kind of attitude seems self-defeating. Despite being > *strongly* in favor of this effort, I would oppose it if were > strictly a Debian thing. We can inspire the move, but going it > alone seems a recipe for present and future pain (think SSHing > from/to Debian and a non-Debian machine).
I bet that other distribtions will also allow me to useradd an UTF-8 name today. I don't think that we have patched useradd to allow this. > > > * the Linux framebuffer console is pretty limited in what > > > glyphs it has available, and the number of glyphs it can > > > support, > > > > Probably, yes. But people working on the Linux framebuffer console are > > unlikely to actually use UTF-8 user names, so the only really bad > > With all due respect, this seems totally unsupported by anything > other than vibes =]. So you think that we should be stricter than we are today? > > > * broken localization (or failure to call setlocale()) could be > > > a bigger problem, especially for root/system accounts. > > > > I don't think we should allow UTF-8 charactes in the string "root" or in > > system account names. And if a local admin decides to do so, Debian > > packages should still restrict themselves to using US-ASCII in their > > system accounts. > > Why? This would require multiple code paths for what seems to me a > very questionable objective. You point out later in your > response that there already exist diverging codepaths, but isn't > unifying such things always a goal? I think that the distinction between system users and regular users is a good thing and that we should continue treating them differently. Strictly, it's only adduser (and useradd, UID only) having different code paths, the treatment in other software is identical. Even if we unify things (either by allowing strange characters in system user names, or by restricting regular user names to the western character set), adduser will need to keep the distinction because we assign UIDs from different ranges. > > Do you have a suggestion for a perl regexp that allows this? My current > > development directory has "qr/[\p{Graph}*\.\${}><%'@]+/". > > I do not. This is not a regex problem in my mind and experience; > you need full access to complicated libraries. Adduser will have to stick to regexes for dependency reasons. >Any such effort > should go through Annex 15 canonicalization before being > inspected at all. I have always assumed that canonicalization would be used for sorting and equality, while in the databases it is important to keep the difference between the unit Angstrom and the capital letter A with circle. If we canonicalize everything, why do we have different codepoints for different semantics? Yes, I need to read your book. >At that point, you're well past regular > languages so far as I can tell. I do not see this goal as > possible with small surgeries on the adduser code base, but > rather something that requires work across the chain. So, "not for Trixie". And what would we do in Trixie? I think we need something that a single person can implement in spare time before christmas. This is a rather limited amount of time that we have. > > > It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a > > > properly set LANG and programs calling setlocale(). This, as > > > alluded to above, has the potential for a big mess. > > Our default is C.UTF-8 and has been like that for a while. > > Yes, but that can be changed. By the local admin? Yes. That's why we (Linux distributions) should stick to us-ascii user names for the accounts that are created by our packages. If a local admin creates UTF-8 user names but gives them a non-UTF-8 locale than it's their fault, and if a user with a UTF-8 user name selects a non-UTF-8 locale it's deliberate sabotage. I don't think we should or care about that, and it's already possible today. > With all due respect, I admire your gung ho candoit spirit, but > adduser alone is not IMHO the place. This is a major change > requiring support from libraries, applications, and UI to do > right, and thus wide buyin. I love the idea, but it's not going > to happen with a few Perl regexes. Please don't read this as > commentary on you or your code. So your recommendation is to disallow things that we have allowed until recently, and maybe remove configurability to REALLY disallow it? Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421