On Tue, Jul 1, 2025 at 5:36 AM Martin D Kealey <mar...@kurahaupo.gen.nz> wrote: > > On Tue, 1 Jul 2025, 08:59 Zachary Santer, <zsan...@gmail.com> wrote: >> >> It appears bash has made the parser more complex by selectively >> removing the pre-expansion step for certain parts of expressions. >> That's bad because the rules for when that happens are undocumented >> [...] > > By far the simplest approach for users to understand would be to: > (1) parse expressions inside identified numeric contexts at the same time as > the surrounding code (rather than simply slurping up text which is re-parsed > when the outer expansion is performed); and ... > >> It seems like the pre-expansion step could be removed for only what >> appears between [ and ] in arithmetic contexts and you'd have what you >> need. At whatever later stage where bash knows if it's dealing with an >> indexed or an associative array element, that content itself could be >> evaluated in an arithmetic context or not. > > (2) parse but defer all $ and `` expansions inside an identified numeric > context until the applicable subexpression is evaluated. Expansion of > associative array keys would also be deferred, but parsed as a simple > concatenation of text components.
Parameter expansions within [[ ]] aren't performed unless the expansion is necessary to determine the return value of the conditional expression. With 'set -u' enabled, $ var_is_unset='true' $ unset var $ [[ ${var_is_unset} == 'true' || ${var} == 'foo' ]] doesn't throw an "unbound variable" error. This makes perfect sense. On the other hand, there's no expectation that this should work: $ op='||' $ [[ ${var_is_unset} == 'true' ${op} ${var} == 'foo' ]] and it doesn't, whereas $ var_is_unset=1 $ unset var $ op='||' $ (( var_is_unset ${op} var > 0 )) does currently work, 'set -u' and all. > "Inside an identified numeric context" would ideally include inside (()), > $(()), $[], numeric array indexes, and assignment to a variable with the -i > attribute. > > Of course, this would not be strictly compatible with POSIX, and it would > make it more complex to write $operator-style expressions, so it would be to > be gated by a shopt or compat setting. > > It would also mean that declare -i and declare -A would affect how subsequent > code is parsed, which makes me want some equivalent of Perl's “BEGIN” > keywords to expedite the effect of declarations found in the middle of any > compound statement. > > A less unpredictable alternative would be to have some syntactic marker that > a subscript should be parsed as associative rather than as a numeric > expression. In any other language this is obvious. e.g. in PHP: > > $foo[$bar+$zot] # clearly numeric index > > $foo["$bar+$zot"] # clearly string lookup > > Clearly we can't use quotes like this in the Shell, so we would need some > other indicator, such as ${map[[$key]]} or ${map{$key}} or ${map.$key} -- > feel free to make up your own suggestion if you don't like any of these. Pretty sure most people on this list would rather avoid that level of breaking backwards compatibility. A different syntax for associative array keys is unnecessary elsewhere, because bash knows whether it's evaluating an indexed or associative array subscript already, in those other contexts. >> You need to do something complex like this, or an associative array >> key like ']' > > Preemptive textual expansion within expressions and subscripts will never be > satisfactory, particularly when it interacts with ongoing changes to implicit > vs explicit quoting rules. There will always be corner cases where > user-supplied data will break how expressions are parsed, or leave gaping > security holes, or both. Input validation will always be necessary, but ']' is a perfectly valid associative array key. Any arbitrary C string, besides the empty string, is one. Bash should be able to cope with arbitrary associative array keys in any context where they might appear. I think my suggestion is a good middle ground. Associative array keys are arbitrary strings, but what can be expanded elsewhere in an arithmetic context and be valid there is much more restricted. Bash has to know if any given subscript references into an associative or an indexed array in order to correctly evaluate that subscript. Deferring evaluation within subscripts until that point would then be more in line with how subscripts are handled elsewhere in bash. The details here would be straightforward enough to explain in the documentation. The shell programmer would then be empowered to safely use $ parameter expansions anywhere in an arithmetic context.