Re: "printf %d ''" should diagnose the empty string

Martin D Kealey Fri, 07 Feb 2025 19:22:50 -0800

Hi Chet

We seem to have very similar opinions about strong backwards compatibility
in theory, and yet somehow we keep butting heads on how that pans out in
practice.

I'm concerned that the last ten years has seen a number of Linux
distributions *stop* including Bash by default, and it has ceased to be the
language of choice for writing new scripts; for the most part that's now
either Python, Node.js, or POSIX sh (dash).

I now wonder whether Bash has much of a future. Each breaking change pushes
Bash ever closer to becoming an irrelevant anachronism. Conversely, changes
that make it less error prone or easier to use somewhat pull it back from
the brink of oblivion.

Which brings me to my latest concern.

While warning about potential bugs helps make Bash easier to use, forcing
spurious warnings on working code is a breaking change. (More below about
tolerance for new warning messages.) Even if there's no clear way to decide
which outweighs the other, the third option would *clearly* outweigh them
both: create an optional warning that can be enabled to help track down
bugs, but left disabled in existing (presumably correctly working)
production code.

My key point is that an empty string *meaning* zero can arise quite
ordinarily from *not* treating zero as a special case. Requiring new
special cases just to avoid empty strings seems counterproductive to making
Bash easier to use.

The proposed change in 5.3-alpha won't transform any broken programs into
working ones, but it will break (at least a few) working programs. For
example:

   - printf '%u\n' "${x##*(0)}"   # make sure x is not interpreted as octal

It is irrelevant whether it is possible to rewrite this to avoid the
proposed warning; what matters is that correctly working scripts will have
to be audited (and potentially modified) if this change goes ahead. None of
us know how many scripts will require changes, but I'm categorically
certain that the answer won't be "none", and I'm reasonably sure that the
number that will require auditing will be larger by a substantial ratio.

And these scripts can't stand a warning message?

That's a good question, and I wish people would think past "it's *only* a
warning".

I've witnessed cases where added warnings have cost time and money.

Unexpected warnings can and have made the difference between a cron job
running silently, vs flooding the sysadmin mailbox; or caused the logging
file-system to fill up; or triggered "severe/unknown" monitoring alerts.
For example: a single added warning in a test infrastructure project broke
continuous integration and blocked development in *numerous* other projects
all at once.

Responses to those have included diverting resources from other work,
delaying or rolling back urgent changes, and paying on-call fees (and
ruining sleep).

These are not mere hypothetical threats; I've seen all of them happen in
practice. I will grant that they are a very small proportion of scripts,
but they are high up on the criticality scale.

Even if the warning really is innocuous, the users who run them likely will
have no idea why they're suddenly seeing a new warning as they likely have
zero experience in writing or modifying shell scripts. Yet more load on
tech support to fix a non-bug.

Even a simple fix can be fraught: I've worked for companies whose policies
outright prohibit changing existing shell scripts (or creating new ones),
and require all changes to be done by decommissioning the script and
entirely replacing it with something written in another language. (And
frankly, the more broken toy shell scripts I see in the wild, the stronger
my sympathy grows for that position, even as Bash remains my personal
scripting language of choice.)

I also think it is unreasonable to make printf inconsistent with other
parts of Bash that *do* (silently) allow empty to mean zero, including:

   - ${var:$empty:$length} or ${array[@]:$empty:$length};
   - $((empty));

Conversely, while we're looking at printf, why not produce warnings for:

   - negative arg for %u or %.*… precision;
   - spurious or repeated flags like %--9d (or %-*… with a negative width
   arg);
   - the number of args to printf not being an whole multiple of the number
   of required args;

   - numeric overflow in $((expression)).

For that matter, why *only* a warning? If the point is to detect bugs in
scripts, surely Bash should crash out, like errexit or nounset?

If the eventual intent is to prohibit "empty zero" everywhere, please do it
all at once, not piecemeal. (Having to audit critical scripts every time we
accept a new release of Bash is a major reason we stick with old releases
for so long.)

If that's not the intent, please publish the rationale that separates the
allowed and prohibited cases.

Of course, that's not the only reason why I think that adding this warning
would be the wrong approach:

Right up front the manual says:

   - *Bash is intended to be a conformant implementation of the Shell and
   Utilities portion of the IEEE POSIX specification (IEEE Standard 1003.1).*

This proposed change would appear to contradict this intent, since it makes
the default mode less conformant, without providing any concommittent new
functionality. POSIX says
<https://pubs.opengroup.org/onlinepubs/9799919799/utilities/printf.html>:

   -

   11. If an *argument* operand to be consumed by a conversion
   specification does not exist:
   -

      *[…]*
      -

      If it is an unnumbered argument conversion *[…]* any other *[not b,
      c, or s]* extra conversion specifiers *shall be evaluated as if a
      zero argument were supplied*.

from which it follows that no diagnostic should be printed. While POSIX
doesn't directly specify what happens when converting an empty-string to a
number, it does seem very strange indeed to treat "missing" more harshly
than "empty". Reinforcing this, the examples section says:

   - The *printf* utility is required to notify the user when conversion
   errors are detected while producing numeric output; thus, the following
   results would be expected on an implementation with 32-bit two's-complement
   integers when %d is specified as the *format* operand:

which is then followed by a table of examples that does *not* include the
empty string. It then says:

   - The diagnostic message format is not specified, but these examples
   convey the type of information that should be reported. Note that the value
   shown on standard output is what would be expected as the return value from
   the *strtol*() function as defined in the System Interfaces volume of
   POSIX.1-2024.

It doesn't suggest a diagnostic along the lines of “empty arg treated as
0”, and if the correspondence with strtol() is to be taken at face value,
this would likewise imply that empty string is a valid representation for
zero, since strtol() reports that it has converted the entire string in
that case.

Bash seems to lack any document providing a clear rationale for which
deviations from POSIX should be considered "bugs" vs "enhancements", which
means that users cannot anticipate which "enhancements" might some time in
the future be reclassified as "bugs" and taken away. Users not on this
mailing list may not find out until their distro updates Bash years down
the track and all of a sudden their previously working scripts start
failing. At minimum there should always be a simple "off" switch for
unwelcome changes, but preferably changes should be opt-in.

I'm also confused: why did you yield to my argument to allow «$((0x$empty))
» but decided to go ahead with adding a warning about «printf %s "$empty"»,
when both cases for retaining the existing behaviour rely on essentially
the same arguments?

but I very much doubt that the practice of writing 0 as '' in an argument
> to printf is much used.
>

Please stop with the strawmen. Nobody literally writes printf %d '' - that
would be silly, and a byte longer than necessary.

But we can write «printf 'foo=%d\n' "$foo"» where foo is the result of a
previous string manipulation that could yield an empty (not unset) result.

That said, I can see a case for a “lint” mode that complains about ALL uses
of empty-as-zero, but its use should be optional. I suggest gating the
warning on « shopt -s warnemptyzero ».

(Enabling a warning based on « set +o posix » seems backwards since in
other respects POSIX mode is more picky rather than less.)

Failing all that, if you're dead-set on having this, please (a) send the
warning to BASH_XTRACEFD, so that it can be suppressed without redirecting
stderr; and (b) insert into the manual an explicit description of at least
one expansion explaining that will reliably result in an empty string being
interpreted as numeric zero, so that we can trust that one won't be broken
in future. (And point at it when people report “bugs”.)

-Martin

PS: If you wonder why I'm so ornery about this, the line

   - printf '%u\n' "${x##*(0)}"   # make sure x is not interpreted as octal

was itself a replacement for a previous line

   - printf '%u\n' "$((10#$x))"   # make sure x is not interpreted as octal

which was broken by a previous change to Bash; I chose ${x##*(0)} over
$((10#0$x)) in part because it seemed *less* likely to go wrong in the
future. (Actually both of those versions are collapsed versions of more
complex code; the anomaly was even harder to spot in the actual code.)

On Sat, 23 Nov 2024 at 02:50, Chet Ramey <chet.ra...@case.edu> wrote:

> You're literally the only one making this argument.

The fact that I'm alone **on this list** is completely unsurprising (who
else here computes logarithms using shell builtins?), but that doesn't mean
I'm the only person on earth who considers empty-string as a valid, logical
way to write zero. This mailing list tends to "re-train" newcomers, molding
their understanding to fit “what Bash already does” and “what Chet thinks”.
Seeing other people be told “you're the only one here saying that” does not
exactly encourage a diversity of expression.

It is, indeed, your opinion, and you seem to be soloing it.

Yes it's my opinion that empty is the *most* logical way to write zero,
because it removes cases where zero needs to be treated as an exception;
for example it means that « length(10ⁿ) ≠ length(10ⁿ-1) » is still true
even when n is zero.

Whilst many obviously don't agree, am I really being *illogical*? If so,
where is the flaw in my rationale? ("Not customary" isn't a flaw in my
logic; rather it's merely the logical consequence of zero-length words
being impossible in natural languages.)

I don't doubt that there are scripts out there that inadvertently do this,
> by passing quoted unset variables as printf arguments

If that yields the correct behaviour, why break it?
If that yields incorrect behaviour, nounset or ShellCheck would find it
without this change.

But again this is a strawman. I'm only talking about set-but-empty.

The biggest harms from any breaking change like this are the cost of
reviewing code, and the loss of confidence in the stability of production
systems; even *finding* potentially affected code takes significant effort
in large systems, and verifying that such scripts are not affected can take
an expensive code walk-through and/or extensive testing. THAT is why I
think the harm from this change outweighs its benefit.

Re: "printf %d ''" should diagnose the empty string

Reply via email to