Hi Chet We seem to have very similar opinions about strong backwards compatibility in theory, and yet somehow we keep butting heads on how that pans out in practice.
I'm concerned that the last ten years has seen a number of Linux distributions *stop* including Bash by default, and it has ceased to be the language of choice for writing new scripts; for the most part that's now either Python, Node.js, or POSIX sh (dash). I now wonder whether Bash has much of a future. Each breaking change pushes Bash ever closer to becoming an irrelevant anachronism. Conversely, changes that make it less error prone or easier to use somewhat pull it back from the brink of oblivion. Which brings me to my latest concern. While warning about potential bugs helps make Bash easier to use, forcing spurious warnings on working code is a breaking change. (More below about tolerance for new warning messages.) Even if there's no clear way to decide which outweighs the other, the third option would *clearly* outweigh them both: create an optional warning that can be enabled to help track down bugs, but left disabled in existing (presumably correctly working) production code. My key point is that an empty string *meaning* zero can arise quite ordinarily from *not* treating zero as a special case. Requiring new special cases just to avoid empty strings seems counterproductive to making Bash easier to use. The proposed change in 5.3-alpha won't transform any broken programs into working ones, but it will break (at least a few) working programs. For example: - printf '%u\n' "${x##*(0)}" # make sure x is not interpreted as octal It is irrelevant whether it is possible to rewrite this to avoid the proposed warning; what matters is that correctly working scripts will have to be audited (and potentially modified) if this change goes ahead. None of us know how many scripts will require changes, but I'm categorically certain that the answer won't be "none", and I'm reasonably sure that the number that will require auditing will be larger by a substantial ratio. And these scripts can't stand a warning message? That's a good question, and I wish people would think past "it's *only* a warning". I've witnessed cases where added warnings have cost time and money. Unexpected warnings can and have made the difference between a cron job running silently, vs flooding the sysadmin mailbox; or caused the logging file-system to fill up; or triggered "severe/unknown" monitoring alerts. For example: a single added warning in a test infrastructure project broke continuous integration and blocked development in *numerous* other projects all at once. Responses to those have included diverting resources from other work, delaying or rolling back urgent changes, and paying on-call fees (and ruining sleep). These are not mere hypothetical threats; I've seen all of them happen in practice. I will grant that they are a very small proportion of scripts, but they are high up on the criticality scale. Even if the warning really is innocuous, the users who run them likely will have no idea why they're suddenly seeing a new warning as they likely have zero experience in writing or modifying shell scripts. Yet more load on tech support to fix a non-bug. Even a simple fix can be fraught: I've worked for companies whose policies outright prohibit changing existing shell scripts (or creating new ones), and require all changes to be done by decommissioning the script and entirely replacing it with something written in another language. (And frankly, the more broken toy shell scripts I see in the wild, the stronger my sympathy grows for that position, even as Bash remains my personal scripting language of choice.) I also think it is unreasonable to make printf inconsistent with other parts of Bash that *do* (silently) allow empty to mean zero, including: - ${var:$empty:$length} or ${array[@]:$empty:$length}; - $((empty)); Conversely, while we're looking at printf, why not produce warnings for: - negative arg for %u or %.*… precision; - spurious or repeated flags like %--9d (or %-*… with a negative width arg); - the number of args to printf not being an whole multiple of the number of required args; - numeric overflow in $((expression)). For that matter, why *only* a warning? If the point is to detect bugs in scripts, surely Bash should crash out, like errexit or nounset? If the eventual intent is to prohibit "empty zero" everywhere, please do it all at once, not piecemeal. (Having to audit critical scripts every time we accept a new release of Bash is a major reason we stick with old releases for so long.) If that's not the intent, please publish the rationale that separates the allowed and prohibited cases. Of course, that's not the only reason why I think that adding this warning would be the wrong approach: Right up front the manual says: - *Bash is intended to be a conformant implementation of the Shell and Utilities portion of the IEEE POSIX specification (IEEE Standard 1003.1).* This proposed change would appear to contradict this intent, since it makes the default mode less conformant, without providing any concommittent new functionality. POSIX says <https://pubs.opengroup.org/onlinepubs/9799919799/utilities/printf.html>: - 11. If an *argument* operand to be consumed by a conversion specification does not exist: - *[…]* - If it is an unnumbered argument conversion *[…]* any other *[not b, c, or s]* extra conversion specifiers *shall be evaluated as if a zero argument were supplied*. from which it follows that no diagnostic should be printed. While POSIX doesn't directly specify what happens when converting an empty-string to a number, it does seem very strange indeed to treat "missing" more harshly than "empty". Reinforcing this, the examples section says: - The *printf* utility is required to notify the user when conversion errors are detected while producing numeric output; thus, the following results would be expected on an implementation with 32-bit two's-complement integers when %d is specified as the *format* operand: which is then followed by a table of examples that does *not* include the empty string. It then says: - The diagnostic message format is not specified, but these examples convey the type of information that should be reported. Note that the value shown on standard output is what would be expected as the return value from the *strtol*() function as defined in the System Interfaces volume of POSIX.1-2024. It doesn't suggest a diagnostic along the lines of “empty arg treated as 0”, and if the correspondence with strtol() is to be taken at face value, this would likewise imply that empty string is a valid representation for zero, since strtol() reports that it has converted the entire string in that case. Bash seems to lack any document providing a clear rationale for which deviations from POSIX should be considered "bugs" vs "enhancements", which means that users cannot anticipate which "enhancements" might some time in the future be reclassified as "bugs" and taken away. Users not on this mailing list may not find out until their distro updates Bash years down the track and all of a sudden their previously working scripts start failing. At minimum there should always be a simple "off" switch for unwelcome changes, but preferably changes should be opt-in. I'm also confused: why did you yield to my argument to allow «$((0x$empty)) » but decided to go ahead with adding a warning about «printf %s "$empty"», when both cases for retaining the existing behaviour rely on essentially the same arguments? but I very much doubt that the practice of writing 0 as '' in an argument > to printf is much used. > Please stop with the strawmen. Nobody literally writes printf %d '' - that would be silly, and a byte longer than necessary. But we can write «printf 'foo=%d\n' "$foo"» where foo is the result of a previous string manipulation that could yield an empty (not unset) result. That said, I can see a case for a “lint” mode that complains about ALL uses of empty-as-zero, but its use should be optional. I suggest gating the warning on « shopt -s warnemptyzero ». (Enabling a warning based on « set +o posix » seems backwards since in other respects POSIX mode is more picky rather than less.) Failing all that, if you're dead-set on having this, please (a) send the warning to BASH_XTRACEFD, so that it can be suppressed without redirecting stderr; and (b) insert into the manual an explicit description of at least one expansion explaining that will reliably result in an empty string being interpreted as numeric zero, so that we can trust that one won't be broken in future. (And point at it when people report “bugs”.) -Martin PS: If you wonder why I'm so ornery about this, the line - printf '%u\n' "${x##*(0)}" # make sure x is not interpreted as octal was itself a replacement for a previous line - printf '%u\n' "$((10#$x))" # make sure x is not interpreted as octal which was broken by a previous change to Bash; I chose ${x##*(0)} over $((10#0$x)) in part because it seemed *less* likely to go wrong in the future. (Actually both of those versions are collapsed versions of more complex code; the anomaly was even harder to spot in the actual code.) On Sat, 23 Nov 2024 at 02:50, Chet Ramey <chet.ra...@case.edu> wrote: > You're literally the only one making this argument. The fact that I'm alone **on this list** is completely unsurprising (who else here computes logarithms using shell builtins?), but that doesn't mean I'm the only person on earth who considers empty-string as a valid, logical way to write zero. This mailing list tends to "re-train" newcomers, molding their understanding to fit “what Bash already does” and “what Chet thinks”. Seeing other people be told “you're the only one here saying that” does not exactly encourage a diversity of expression. It is, indeed, your opinion, and you seem to be soloing it. Yes it's my opinion that empty is the *most* logical way to write zero, because it removes cases where zero needs to be treated as an exception; for example it means that « length(10ⁿ) ≠ length(10ⁿ-1) » is still true even when n is zero. Whilst many obviously don't agree, am I really being *illogical*? If so, where is the flaw in my rationale? ("Not customary" isn't a flaw in my logic; rather it's merely the logical consequence of zero-length words being impossible in natural languages.) I don't doubt that there are scripts out there that inadvertently do this, > by passing quoted unset variables as printf arguments If that yields the correct behaviour, why break it? If that yields incorrect behaviour, nounset or ShellCheck would find it without this change. But again this is a strawman. I'm only talking about set-but-empty. The biggest harms from any breaking change like this are the cost of reviewing code, and the loss of confidence in the stability of production systems; even *finding* potentially affected code takes significant effort in large systems, and verifying that such scripts are not affected can take an expensive code walk-through and/or extensive testing. THAT is why I think the harm from this change outweighs its benefit.