At 2025-04-21T14:50:40+0200, Santiago Vila wrote: > Also, while the idea of Josh might sound good in theory (adding > dependencies will not harm anybody, we just want to see the > dependencies explicit),
While I support that proposal and initiative... > it might create some undeserved pressure on maintainers to stop using > awk. I agree with that, too. Our industry struggles to resist recurring trends to rewrite everything in the language du jour. This decade it's Go and/or Rust; both languages have things to recommend them (and both their communities have demonstrated worrisome governance problems). > In some cases I'm sure that it would be easy to rewrite the code, but > in some others the alternate construction may be a lot less readable, > and overall worse. Yes, and we also have to ask what we have to gain by doing so, apart from fashionability and bragging rights on a CV. Nothing stops anyone from reimplementing anything in any language and slapping it up on GitHub or a Gitlab site to prove their skills--but my impression is that a lot of people only feel such an undertaking is worth the effort if they can cram it down a lot of other people's throats. Doing so shows that one is "impactful", and therefore appealing to hiring managers at startups and other places that natter on about being "disruptive" and about Schumpeter's "creative destruction" as the engine of capitalism. AWK is a nice language--small, pleasant, and consistent for problems where C is too much trouble but a C-ish syntax is comfortably familiar to your target audience, the shell is too quirky, and where you don't need a bulky standard library. > Note also that the base system and the container images are expected > to grow over time, because everything grows over time, but machines > hosting those container images also grow over time, so one would > naturally wonder why awk has become a problem now when it was never > a problem due to its extremely small size. Yes. I have little interest in the drive to shrink container images for its/their own sake. > My modest proposal here after trixie, if there is a consensus that > it's a good step, would be to replace mawk by original-awk in the > base system and see what can we learn from that. I just learned that you're the maintainer of original-awk (a.k.a. BWK AWK)... We can observe right now that the space savings is meager. Using older data on amd64, I see: Package: original-awk Version: 2018-08-27-1 Maintainer: Santiago Vila <sanv...@debian.org> Installed-Size: 180 kB Package: mawk Version: 1.3.4.20200120-2 Maintainer: Boyuan Yang <by...@debian.org> Installed-Size: 248 kB ...for a savings of 68kB from your proposal. Hmm, how much does perl-base grow from one Debian release to the next? Package: perl-base Version: 5.36.0-7+deb12u2 Installed-Size: 7639 kB Package: perl-base Version: 5.40.1-3 Installed-Size: 7811 kB Difference: 172 kB So whatever we'd have gained by hypothetically trading mawk for original-awk in trixie, or even eliminating AWK entirely from the Essential set, we'd have traded away simply by having Perl around. > I would see that little change as something similar to what we did > with /bin/sh being replaced by dash to ensure compatibility and > standards compliance This argument requires a footnote. Dash has its own problems with POSIX conformance[1] and we insist on a couple of extensions to POSIX behavior for own own sanity (the one I can remember is the `local` keyword). > (back then, we discovered some bashisms, and we either rewrote them to > be sh-compliant or used #!/bin/bash instead, and everybody was happy > with those little incremental changes). It was a good thing to do, but the standards-compliance benefit was, I think, more a matter of inchoate bragging rights (see above) than concrete benefit. The benefit, I think, came in saying what we meant: either expressing dependencies explicitly, or eliminating unnecessary ones. Also, it was really important for people using "upstart" as their init system because, as I recall, the time differential when dynamically loading Bash versus dash was thought to be an easy win for performance. Bragging rights and impactfulness again. > I don't think we have many mawk-isms in the distribution, but this > would be an opportunity to check if all AWKs are really > interchangeable. ...and make you the maintainer of (even more?) Essential packages. ;-) original-awk's man page admits to one area of POSIX-nonconformance: BUGS ... POSIX‐standard interval expressions in regular expressions are not supported. ...which I think weakens the case for your proposal helping us to have AWK scripts that don't exercise extensions to POSIX. (But maybe the newer original-awk that supports CSV data--a non-POSIX extension--fixes that.) I wonder if it'd be less effort to _review_ what AWK scripts we have in maintainer scripts for satisfiability by any POSIX-conforming AWK. How many can there be? </Jeremy Clarkson> Regards, Branden [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862907 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870317 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961737 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076035 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087810 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101388
signature.asc
Description: PGP signature