Re: parser bug in Q_ARITH

Zachary Santer Tue, 01 Jul 2025 12:02:41 -0700

On Tue, Jul 1, 2025 at 5:36 AM Martin D Kealey <[email protected]> wrote:
>
> On Tue, 1 Jul 2025, 08:59 Zachary Santer, <[email protected]> wrote:
>>
>> It appears bash has made the parser more complex by selectively
>> removing the pre-expansion step for certain parts of expressions.
>> That's bad because the rules for when that happens are undocumented
>> [...]
>
> By far the simplest approach for users to understand would be to:
> (1) parse expressions inside identified numeric contexts at the same time as 
> the surrounding code (rather than simply slurping up text which is re-parsed 
> when the outer expansion is performed); and ...
>
>> It seems like the pre-expansion step could be removed for only what
>> appears between [ and ] in arithmetic contexts and you'd have what you
>> need. At whatever later stage where bash knows if it's dealing with an
>> indexed or an associative array element, that content itself could be
>> evaluated in an arithmetic context or not.
>
> (2) parse but defer all $ and `` expansions inside an identified numeric 
> context until the applicable subexpression is evaluated. Expansion of 
> associative array keys would also be deferred, but parsed as a simple 
> concatenation of text components.


Parameter expansions within [[ ]] aren't performed unless the
expansion is necessary to determine the return value of the
conditional expression. With 'set -u' enabled,
$ var_is_unset='true'
$ unset var
$ [[ ${var_is_unset} == 'true' || ${var} == 'foo' ]]
doesn't throw an "unbound variable" error. This makes perfect sense.

On the other hand, there's no expectation that this should work:
$ op='||'
$ [[ ${var_is_unset} == 'true' ${op} ${var} == 'foo' ]]
and it doesn't, whereas
$ var_is_unset=1
$ unset var
$ op='||'
$ (( var_is_unset ${op} var > 0 ))
does currently work, 'set -u' and all.

> "Inside an identified numeric context" would ideally include inside (()), 
> $(()), $[], numeric array indexes, and assignment to a variable with the -i 
> attribute.
>
> Of course, this would not be strictly compatible with POSIX, and it would 
> make it more complex to write $operator-style expressions, so it would be to 
> be gated by a shopt or compat setting.
>
> It would also mean that declare -i and declare -A would affect how subsequent 
> code is parsed, which makes me want some equivalent of Perl's “BEGIN” 
> keywords to expedite the effect of declarations found in the middle of any 
> compound statement.
>
> A less unpredictable alternative would be to have some syntactic marker that 
> a subscript should be parsed as associative rather than as a numeric 
> expression. In any other language this is obvious. e.g. in PHP:
>
>   $foo[$bar+$zot]    # clearly numeric index
>
>   $foo["$bar+$zot"]  # clearly string lookup
>
> Clearly we can't use quotes like this in the Shell, so we would need some 
> other indicator, such as ${map[[$key]]} or ${map{$key}} or ${map.$key} -- 
> feel free to make up your own suggestion if you don't like any of these.

Pretty sure most people on this list would rather avoid that level of
breaking backwards compatibility. A different syntax for associative
array keys is unnecessary elsewhere, because bash knows whether it's
evaluating an indexed or associative array subscript already, in those
other contexts.

>> You need to do something complex like this, or an associative array
>> key like ']'
>
> Preemptive textual expansion within expressions and subscripts will never be 
> satisfactory, particularly when it interacts with ongoing changes to implicit 
> vs explicit quoting rules. There will always be corner cases where 
> user-supplied data will break how expressions are parsed, or leave gaping 
> security holes, or both.

Input validation will always be necessary, but ']' is a perfectly
valid associative array key. Any arbitrary C string, besides the empty
string, is one. Bash should be able to cope with arbitrary associative
array keys in any context where they might appear.

I think my suggestion is a good middle ground. Associative array keys
are arbitrary strings, but what can be expanded elsewhere in an
arithmetic context and be valid there is much more restricted. Bash
has to know if any given subscript references into an associative or
an indexed array in order to correctly evaluate that subscript.
Deferring evaluation within subscripts until that point would then be
more in line with how subscripts are handled elsewhere in bash.

The details here would be straightforward enough to explain in the
documentation. The shell programmer would then be empowered to safely
use $ parameter expansions anywhere in an arithmetic context.

Re: parser bug in Q_ARITH

Reply via email to