Hi Alex, Letting off some steam here after a going an exhausting ten rounds with "asciification" in groff, a process that has consumed the month to date.
At 2025-09-19T00:24:23+0200, Alejandro Colomar wrote: > On Thu, Sep 18, 2025 at 04:32:34PM -0500, G. Branden Robinson wrote: > > > When 202601 is out, you'll get streq() and memeq(), and I'll send > > > patches for them. :) > > > > Looking forward to that--those should have been in libc in the > > 1980s! > > Heh! There's still people in the C Committee that doesn't like them. > Some false purists thinks that only system calls and other magic > functions should be in the standard, That seems to me an odd stance for the C committee itself to take, given that the standard doesn't provide abstractions for operating system services except in an extremely minimal sense; something similar to what MS-DOS 1.0 offered (a file system, but one with no hierarchy, just a big flat file store with no directories and no file types except "regular", except you could open those in "text" or "binary" "modes"). > and that convenience functions should go in external libraries (that > would exclude every string.h function from libc except for memset(3) > because of its magic aliasing properties; insane, IMO). I think a stronger argument for standardizing memset(3) and memcpy(3) is that in early days, the C language itself provided _no_ facility for copying anything that wasn't a primitive type. If you wanted to copy a struct, you had to do it field by field. I think it was ANSI C that made structure copying (by assignment of rvalues to lvalues, both of the same struct type) part of the language proper. A while back Doug and I had an exchange where we mused that prior to this, everything you could do with a statement (apart from a function call, of course), mapped to a bounded and small set of machine instructions in pretty much any ISA. That's a nice property for "racing the beam"-style programming and other hard real-time problems, but not as much use for general applications. That conversation reminded me of the Intel 8080, which had no hardware multiplier and no block-memory move/copy instructions. Everything you could do on that machine you could reliably cycle-count prior to assembly. But the Z80, which still had no multiplier but _did_ have instructions that could walk up to the entire 64KB address space, fuzzed that line a little bit. (It was still deterministic if the range given to instructions like LDIR was what we today call a "constexpr".) With multiplication (and of course division), you don't know how many cycles you're going to need, and many years later (or maybe right away at NSA, FSK, and MSS) people figured how to use such indeterminacy in speed of instruction retirement to exfiltrate secrets. Anyway, the Z80 started to eat Intel's lunch. That made them very angry, so they hurried the 8086 to market to punish the entire world, at which they've succeeded brilliantly for decades. How dare the free market not lavish one company exclusively with rewards? > Some others just think libc functions should have some complexity; > adding simple wrappers seemingly doesn't make them feel proud of > inventing useless crap; it doesn't make them look smart. The > committee is really something out of a comedy. And there are others. > I could tell stories... Why not both? Why not offer useful primitives _and_ APIs that hide complexity in favor of making commonly undertaken operations straightforward to perform? I am reminded of the tired old argument between those who advocate an argumentless cat(1) and those who don't. I think that's really an argument between people who want command-line tools that go straight to system calls and exercise the kernel with few confounding factors, and those who, ya know, actually want to use cat(1) to _do_ something, like stitch files together or look at their contents. And we _should_ let people have thin wrappers around kernel services if they want them. That helps everybody understand what those services are, advertises what they do and don't provide, and eases evaluation of the kernel's interface design and performance. Both systems programmers and application developers are real people. If you want your language used by both, you must serve the needs of both. I think a similar question is at the root of our mild disagreement over the respective merits of memset(3) and bzero(3). I think the former is a proper thing to have; it's a nigh-essential service for a language runtime to offer. But you're right that most people developing applications want memory cleared to zeros several nines of the time. > Ironically, they added memccpy(3) in C23, and it has 0 users in the > real world. That one was probably introduced because it made the > committee look smart, because they arrived first at discovering a > function they thought useful (hint: it's not). Too bad that > memccpy(3) is as dangerous as strncpy(3). It doesn't seem stupid to me; it's a _generalization_ of strncpy(). Who says all memory buffers look like C strings? groff's own under- documented distinction between these--groff's "strings" are really arbitrary memory buffers that can contain interior nulls, and its "symbol" type a pretty close match to a C string--has led me to appreciate the virtues of making strong and clear contrasts here. > Will they ever realize it has no users and that they promoted a > function that is unsafe and now starts being used by innocent > programmers? Better, I think, would be to come up with a label or name for these "primitives", and segregate their header files and, insofar as is practical, their symbol names in the function name space, which resembles the MS-DOS 1.0 file store. > Probably not; that's a problem for the next generation of committee > members; they'll retire before the fallout. Like physics, I guess it progresses one funeral at a time! > > wonder by how many orders of magnitude string (in)equality > > comparisons exceed string collation order comparisons. > > I have numbers in my laptop. I developed a patch for glibc adding > these APIs and then replacing every possible use within glibc itself. > When I use my laptop tomorrow, I can check the remaining strcmp() > calls compared to streq(). I remember having looked at the ratio, but > don't remember the numbers. I think it was in the hundreds of > equality calls per each sorting call. If you'd asked to me to bet, I'd have wagered at least 2 orders of magnitude, yeah. You probably could have bluffed me into 3. ;-) > Well; even during the initial period, the unfamiliarity isn't worse > than inventing your own name. After all, you need to invent a name. > :) Yes. It's just that groff is over that hill now. > > I don't disagree with the migration; it just seems like an > > "eventually" thing to me. > > If you have some window of time where you'd apply it, I can have the > patch ready for that window. I guess once you decide to apply it it's > a matter of running git-am(1), and forgetting about it. It should be > a moment when your local queue of patches is small, to reduce your > rebasing work. But being a trivial (yet large) patch, it's not > something I see very problematic. Right, and if another committer wants to shepherd the change through, I won't put any stop energy on it, except... > The major blocker is bumping gnulib; just let me know when you'll do > that. ...for that, which is a kick I'd prefer to execute in a release management capacity. But I reckon right after a kick to the 2025-07 gnulib tag, or right after the 1.24 release are both good times. > > Cool! Ritchie's rolling over in his grave to see C approaching full > > language support for container iterators like this. :P > > Actually, I think this is something that was originally devised by > K&R, and I'm just filling the gaps. I can't see another reason they > allowed using array notation in parameters, if they didn't want them > to behave like that. I read recently that a classic old bit of weirdness/cleverness that has been widely, but perversely celebrated in C, namely the synonymy of `a[5]` and `5[a]`, is slated for the chopping block. I think I saw something about it in a recent GCC commit--something about the rules for array decay changing. The grognards are going to void their bowels about that one. The synonymy doesn't _mean_ anything--all it is is a reflection of the symmetry (or maybe commutativity is a better word) of assembly language expressions in ISAs that support indexed addressing modes (which is every machine I've personally encountered). There's no deep meaning to the synonymy, it's confusing to learners, and it offers yet another vector for the construction of obfuscated code. I have little time for people who boast about the virtues of programming in assembly ("portable" or otherwise) while seeming to actually do precious little of it. Regards, Branden
signature.asc
Description: PGP signature