Hi Alex,

Letting off some steam here after a going an exhausting ten rounds with
"asciification" in groff, a process that has consumed the month to date.

At 2025-09-19T00:24:23+0200, Alejandro Colomar wrote:
> On Thu, Sep 18, 2025 at 04:32:34PM -0500, G. Branden Robinson wrote:
> > > When 202601 is out, you'll get streq() and memeq(), and I'll send
> > > patches for them.  :)
> > 
> > Looking forward to that--those should have been in libc in the
> > 1980s!
> 
> Heh!  There's still people in the C Committee that doesn't like them.
> Some false purists thinks that only system calls and other magic
> functions should be in the standard,

That seems to me an odd stance for the C committee itself to take, given
that the standard doesn't provide abstractions for operating system
services except in an extremely minimal sense; something similar to what
MS-DOS 1.0 offered (a file system, but one with no hierarchy, just a big
flat file store with no directories and no file types except "regular",
except you could open those in "text" or "binary" "modes").

> and that convenience functions should go in external libraries (that
> would exclude every string.h function from libc except for memset(3)
> because of its magic aliasing properties; insane, IMO).

I think a stronger argument for standardizing memset(3) and memcpy(3) is
that in early days, the C language itself provided _no_ facility for
copying anything that wasn't a primitive type.  If you wanted to copy a
struct, you had to do it field by field.  I think it was ANSI C that
made structure copying (by assignment of rvalues to lvalues, both of the
same struct type) part of the language proper.  A while back Doug and I
had an exchange where we mused that prior to this, everything you could
do with a statement (apart from a function call, of course), mapped to a
bounded and small set of machine instructions in pretty much any ISA.
That's a nice property for "racing the beam"-style programming and other
hard real-time problems, but not as much use for general applications.

That conversation reminded me of the Intel 8080, which had no hardware
multiplier and no block-memory move/copy instructions.  Everything you
could do on that machine you could reliably cycle-count prior to
assembly.  But the Z80, which still had no multiplier but _did_ have
instructions that could walk up to the entire 64KB address space, fuzzed
that line a little bit.  (It was still deterministic if the range given
to instructions like LDIR was what we today call a "constexpr".)

With multiplication (and of course division), you don't know how many
cycles you're going to need, and many years later (or maybe right away
at NSA, FSK, and MSS) people figured how to use such indeterminacy in
speed of instruction retirement to exfiltrate secrets.

Anyway, the Z80 started to eat Intel's lunch.  That made them very
angry, so they hurried the 8086 to market to punish the entire world,
at which they've succeeded brilliantly for decades.  How dare the free
market not lavish one company exclusively with rewards?

> Some others just think libc functions should have some complexity;
> adding simple wrappers seemingly doesn't make them feel proud of
> inventing useless crap; it doesn't make them look smart.  The
> committee is really something out of a comedy.  And there are others.
> I could tell stories...

Why not both?  Why not offer useful primitives _and_ APIs that hide
complexity in favor of making commonly undertaken operations
straightforward to perform?

I am reminded of the tired old argument between those who advocate an
argumentless cat(1) and those who don't.  I think that's really an
argument between people who want command-line tools that go straight to
system calls and exercise the kernel with few confounding factors, and
those who, ya know, actually want to use cat(1) to _do_ something, like
stitch files together or look at their contents.

And we _should_ let people have thin wrappers around kernel services if
they want them.  That helps everybody understand what those services
are, advertises what they do and don't provide, and eases evaluation of
the kernel's interface design and performance.

Both systems programmers and application developers are real people.  If
you want your language used by both, you must serve the needs of both.

I think a similar question is at the root of our mild disagreement over
the respective merits of memset(3) and bzero(3).  I think the former is
a proper thing to have; it's a nigh-essential service for a language
runtime to offer.  But you're right that most people developing
applications want memory cleared to zeros several nines of the time.

> Ironically, they added memccpy(3) in C23, and it has 0 users in the
> real world.  That one was probably introduced because it made the
> committee look smart, because they arrived first at discovering a
> function they thought useful (hint: it's not).  Too bad that
> memccpy(3) is as dangerous as strncpy(3).

It doesn't seem stupid to me; it's a _generalization_ of strncpy().  Who
says all memory buffers look like C strings?  groff's own under-
documented distinction between these--groff's "strings" are really
arbitrary memory buffers that can contain interior nulls, and its
"symbol" type a pretty close match to a C string--has led me to
appreciate the virtues of making strong and clear contrasts here.

> Will they ever realize it has no users and that they promoted a
> function that is unsafe and now starts being used by innocent
> programmers?

Better, I think, would be to come up with a label or name for these
"primitives", and segregate their header files and, insofar as is
practical, their symbol names in the function name space, which
resembles the MS-DOS 1.0 file store.

> Probably not; that's a problem for the next generation of committee
> members; they'll retire before the fallout.

Like physics, I guess it progresses one funeral at a time!

> > wonder by how many orders of magnitude string (in)equality
> > comparisons exceed string collation order comparisons.
> 
> I have numbers in my laptop.  I developed a patch for glibc adding
> these APIs and then replacing every possible use within glibc itself.
> When I use my laptop tomorrow, I can check the remaining strcmp()
> calls compared to streq().  I remember having looked at the ratio, but
> don't remember the numbers.  I think it was in the hundreds of
> equality calls per each sorting call.

If you'd asked to me to bet, I'd have wagered at least 2 orders of
magnitude, yeah.  You probably could have bluffed me into 3.  ;-)

> Well; even during the initial period, the unfamiliarity isn't worse
> than inventing your own name.  After all, you need to invent a name.
> :)

Yes.  It's just that groff is over that hill now.

> > I don't disagree with the migration; it just seems like an
> > "eventually" thing to me.
> 
> If you have some window of time where you'd apply it, I can have the
> patch ready for that window.  I guess once you decide to apply it it's
> a matter of running git-am(1), and forgetting about it.  It should be
> a moment when your local queue of patches is small, to reduce your
> rebasing work.  But being a trivial (yet large) patch, it's not
> something I see very problematic.

Right, and if another committer wants to shepherd the change through, I
won't put any stop energy on it, except...

> The major blocker is bumping gnulib; just let me know when you'll do
> that.

...for that, which is a kick I'd prefer to execute in a release
management capacity.  But I reckon right after a kick to the 2025-07
gnulib tag, or right after the 1.24 release are both good times.

> > Cool!  Ritchie's rolling over in his grave to see C approaching full
> > language support for container iterators like this.  :P
> 
> Actually, I think this is something that was originally devised by
> K&R, and I'm just filling the gaps.  I can't see another reason they
> allowed using array notation in parameters, if they didn't want them
> to behave like that.

I read recently that a classic old bit of weirdness/cleverness that has
been widely, but perversely celebrated in C, namely the synonymy of
`a[5]` and `5[a]`, is slated for the chopping block.  I think I saw
something about it in a recent GCC commit--something about the rules for
array decay changing.

The grognards are going to void their bowels about that one.  The
synonymy doesn't _mean_ anything--all it is is a reflection of the
symmetry (or maybe commutativity is a better word) of assembly language
expressions in ISAs that support indexed addressing modes (which is
every machine I've personally encountered).

There's no deep meaning to the synonymy, it's confusing to learners, and
it offers yet another vector for the construction of obfuscated code.

I have little time for people who boast about the virtues of programming
in assembly ("portable" or otherwise) while seeming to actually do
precious little of it.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature

Reply via email to