At 2025-04-21T14:50:40+0200, Santiago Vila wrote:
> Also, while the idea of Josh might sound good in theory (adding
> dependencies will not harm anybody, we just want to see the
> dependencies explicit),

While I support that proposal and initiative...

> it might create some undeserved pressure on maintainers to stop using
> awk.

I agree with that, too.  Our industry struggles to resist recurring
trends to rewrite everything in the language du jour.  This decade it's
Go and/or Rust; both languages have things to recommend them (and both
their communities have demonstrated worrisome governance problems).

> In some cases I'm sure that it would be easy to rewrite the code, but
> in some others the alternate construction may be a lot less readable,
> and overall worse.

Yes, and we also have to ask what we have to gain by doing so, apart
from fashionability and bragging rights on a CV.  Nothing stops anyone
from reimplementing anything in any language and slapping it up on
GitHub or a Gitlab site to prove their skills--but my impression is that
a lot of people only feel such an undertaking is worth the effort if
they can cram it down a lot of other people's throats.  Doing so shows
that one is "impactful", and therefore appealing to hiring managers at
startups and other places that natter on about being "disruptive" and
about Schumpeter's "creative destruction" as the engine of capitalism.

AWK is a nice language--small, pleasant, and consistent for problems
where C is too much trouble but a C-ish syntax is comfortably familiar
to your target audience, the shell is too quirky, and where you don't
need a bulky standard library.

> Note also that the base system and the container images are expected
> to grow over time, because everything grows over time, but machines
> hosting those container images also grow over time, so one would
> naturally wonder why awk has become a problem now when it was never
> a problem due to its extremely small size.

Yes.  I have little interest in the drive to shrink container images for
its/their own sake.

> My modest proposal here after trixie, if there is a consensus that
> it's a good step, would be to replace mawk by original-awk in the
> base system and see what can we learn from that.

I just learned that you're the maintainer of original-awk (a.k.a. BWK
AWK)...

We can observe right now that the space savings is meager.  Using older
data on amd64, I see:

Package: original-awk
Version: 2018-08-27-1
Maintainer: Santiago Vila <sanv...@debian.org>
Installed-Size: 180 kB

Package: mawk
Version: 1.3.4.20200120-2
Maintainer: Boyuan Yang <by...@debian.org>
Installed-Size: 248 kB

...for a savings of 68kB from your proposal.  Hmm, how much does
perl-base grow from one Debian release to the next?

Package: perl-base
Version: 5.36.0-7+deb12u2
Installed-Size: 7639 kB

Package: perl-base
Version: 5.40.1-3
Installed-Size: 7811 kB

Difference: 172 kB

So whatever we'd have gained by hypothetically trading mawk for
original-awk in trixie, or even eliminating AWK entirely from the
Essential set, we'd have traded away simply by having Perl around.

> I would see that little change as something similar to what we did
> with /bin/sh being replaced by dash to ensure compatibility and
> standards compliance

This argument requires a footnote.  Dash has its own problems with POSIX
conformance[1] and we insist on a couple of extensions to POSIX behavior
for own own sanity (the one I can remember is the `local` keyword).

> (back then, we discovered some bashisms, and we either rewrote them to
> be sh-compliant or used #!/bin/bash instead, and everybody was happy
> with those little incremental changes).

It was a good thing to do, but the standards-compliance benefit was, I
think, more a matter of inchoate bragging rights (see above) than
concrete benefit.  The benefit, I think, came in saying what we meant:
either expressing dependencies explicitly, or eliminating unnecessary
ones.  Also, it was really important for people using "upstart" as their
init system because, as I recall, the time differential when dynamically
loading Bash versus dash was thought to be an easy win for performance.
Bragging rights and impactfulness again.

> I don't think we have many mawk-isms in the distribution, but this
> would be an opportunity to check if all AWKs are really
> interchangeable.

...and make you the maintainer of (even more?) Essential packages.  ;-)

original-awk's man page admits to one area of POSIX-nonconformance:

BUGS
...
     POSIX‐standard interval expressions in regular expressions are not
     supported.

...which I think weakens the case for your proposal helping us to have
AWK scripts that don't exercise extensions to POSIX.  (But maybe the
newer original-awk that supports CSV data--a non-POSIX extension--fixes
that.)

I wonder if it'd be less effort to _review_ what AWK scripts we have
in maintainer scripts for satisfiability by any POSIX-conforming AWK.
How many can there be?  </Jeremy Clarkson>

Regards,
Branden

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862907
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870317
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961737
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076035
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087810
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101388

Attachment: signature.asc
Description: PGP signature

Reply via email to