On Thu, 27 Jun 2024 at 06:30, Chet Ramey <chet.ra...@case.edu> wrote:
> On 6/26/24 2:18 PM, Zachary Santer wrote: > > >> On Tue, Jun 11, 2024, 12:49 PM Zachary Santer <zsan...@gmail.com> > wrote: > >>> > >>> $ array=( zero one two three four five six ) > >>> $ printf '%s\n' "${array[@]( 1 5 )}" > >>> one > >>> five > > > > This is different functionality. > > Equivalent to printf '%s\n' "${array[1}" "${array[5]}". The innovation Zach > wants is to have a single word expansion to do this. > Surely the point is to handle the case where we don't know in advance how many elements will be wanted. In effect, it would mimic Perl's @array[@indeces] and @hash{@keys} functionality, where we supply an arbitrary list of indices or subscripts, and get back the corresponding values. Using the proposed syntax we would be able to write: array=( '' one two three four five six ) indices=( 1 0 6 7 5 ) printf '%s, ' "${array[@]( "${indices[@]}" )}" printf end\\n to get one, , six, five, end (Note that there are only 4 words resulting from the expansion, since there is no element '7' in 'array'. Unfortunately - and unlike Perl - Bash doesn't have "undef", so we have to make do with getting back fewer values in the resulting list if some requested array elements are unset, or if some indices exceed the size of the array.) I agree that this syntax looks ugly, but since [@] and [*] don't function as subscripts, it's tricky to improve on. My suggestion would be to generalise, turning [@] and [*] into fixed syntactic tokens that can be combined with "ordinary" subscripting, or left without subscripts to retain their current meanings: "${array[*][index]}" (a long-hand version of "${array[index]}") "${array[@][index]}" (gives "${array[index]}" if it exists, but is completely elided if it doesn't - similar to how "$@" can result in no words, not an empty word) Or maybe we can have some mechanism so that '@[' doesn't get treated as the start of an '@' modifier; and we could use: "${array*[index]}" "${array@[index]}" (For the rest of this discussion I'm just going to mention the '@' form; please infer the corresponding '*' form.) After doing this, I would start working on syntaxes for list-slicing in various ways, perhaps: "${array@[[ list of indices ]]}" "list of indices" is an ordinary word list; it's split up at unquoted $IFS, then each of the resulting words is used as a subscript. I would also revamp how numeric range slices are done (*1): "${array@[ start_index : count ]}" "${array@[ start_index ... end_index ]} For all of these expansions, where each subscripted element of the array exists, it provides a 'word' in the resulting expansion, and where it doesn't exist, no word is provided. With '@', the list is kept as separate words despite being quoted; with '*', the resulting list is joined in the traditional manner. But I would look even further ahead... Firstly, I acknowledge Bash has had to comply with historical expectations, POSIX requirements, and precedent set by ksh. However, having an array subscript expansion change its behaviour based on whether or not a "declare -A" statement has been executed, possibly in a different function or even a different file; that is - by modern standards at least - a rather poor language design choice. (*1) I'm talking about whether the subscript undergoes arithmetic expansion. So I also propose that we should follow Perl in having separate array indexing and map subscripting syntaxes, so that it's no longer necessary to use "declare -A", and more to the point, no longer necessary to go look for it while reading someone else's code. (*2) (I'm about to suggest some syntax, but the exact form isn't really my main point; what's really important is that you would be able to read a $ expansion and tell at a glance whether the subscript will be subject to arithmetic expansion. (*3)) As a secondary issue, deferring *parsing* of arithmetic expressions (until the containing command is executed) obscures syntax errors, delays their reporting, and degrades performance. I would change that, either globally when « shopt -s early_math_parse » is in effect, or in recognized contexts like this new array indexing syntax. (*4) When using the new array indexing syntax, the index would be parsed as an arithmetic expansion while the surrounding commands are being parsed (*5) (and thus ALWAYS evaluated as a numeric expression), and when using the map subscripting syntax it would NEVER be subject to arithmetic expansion. One possible syntax would be: "${assoc_array@{key}}" "${assoc_array@{{list of keys}}}" which would differ from the previous in that 'key' and 'list of keys' would be guaranteed NOT to undergo numeric expansion; importantly, this can be determined at parse time without needing to have executed a 'declare -A' statement. (This becomes more important if we look to eventually implementing lexically scoped variables some time in the future.) If you really can't stomach using {} around subscripts, there are other ways to distinguish them, such as [numeric+expression+without+quotes] vs ["map key in quotes"], but that would make the rule around non-deferral of expression parsing even harder for people to follow. Apart from anything else, either of these approaches would solve the conundrum of handling '@' and '*' and '' (empty) as subscripts; simply write "${array*[@]}" or "${array@{*}}" or "${array@{}}" Lastly, I would also consider: A. having an explicit 'index back from the end' syntax, such as [#-reverse_index], rather than switching based on the sign of the index expression. B. making the '@' optional in places where it doesn't introduce ambiguity. -Martin *1: Unlike ${array[@]:start:count}, these numeric range forms give primacy to indices as addresses for particular entries, rather than to the array being primarily a contiguous "list" whose indices are only required to be monotonic, not consecutive: so if any entries in the range are unset, then you get fewer words in the resulting list. The ":" and "..." are still part of the expansion syntax, not part of the evaluation of a numeric expression, so « var=1:4 ${array[$var]} » would be erroneous. *2: This isn't the only place where dynamic scope has just turned other suboptimal design choices into terrible ones; it's outright hostile to anyone tasked with managing a large shell codebase written by other people. But the dynamic scope of "declare" (and its siblings) deserves special mention, because it's not typically limited to "just once at the top of the program" when it's especially useful inside functions. So even if you can SEE a declare statement, you still have to check whether it's been EXECUTED before the expansion occurs. *3: Ironically this is even more important in the Shell than it is in Perl, since the shell cannot infer which operation is required based on the data type of the subscript - in short, the shell cannot distinguish numeric+expression from "string+expression". *4: This would effectively define a new "parse-time numeric context", that would only apply in places where that context can be established at parse time, unless « shopt -s early_math_parse » was in effect. (*6) In such a context, the handling of numeric expressions would change, so that the expressions « SIX * NINE » and « $SIX * $NINE » would behave identically; so when given « SIX=1+5 NINE=8+1 », they would both produce "54" rather than "42". *5: Just to be clear, I wouldn't immediate implement the early parsing; rather what I'd do is forbid expressions that would be impossible to parse without expanding a variable, so that « op='+' » then « base $op offset » would be disallowed in this new "parse-time numeric context". While "parse and evaluate all at once" makes the implementation code slightly smaller, it's a dubious saving: intermingling parsing and evaluation makes the code *more* complex. You don't even need a complex tree structure to represent the parsed expression; a list of RPN (stack-based) operations can be stored as bytes in what's otherwise a valid C string, and then actually runs faster because of CPU cache locality. Support for short-circuit evaluation can be provided as "branch" instructions. *6: In addition to « shopt -s early_math_parse », it would also make sense to be able to declare a variable as "holding an arithmetic expression", in the same way that one can currently be declared as "holding an integer". The point being, a malformed expression is reported when it's assigned to the variable, rather than later on when the variable is expanded.