Date: Fri, 14 Mar 2025 22:47:38 +0100 From: Steffen Nurpmeso <stef...@sdaoden.eu> Message-ID: <20250314214738.Milbudgm@steffen%sdaoden.eu>
| I cannot reproduce this, both NetBSD sh and bash say | | set -- '' '' a '' '' b '' '';echo $#,$*; set -- $*; echo $#,$* | 8, a b | 2,a b You didn't reproduce the operating conditions for your test, there you don't have IFS=: and that makes all the difference. I didn't go into the implementation details in my reply (if you had bothered to read Greg's reply carefully you might have seen why already) but there are two likely and plausible ways to implement all of this. First (and when in a context where field splitting would happen) to implement "$*" and "$@" correctly required two different code paths (trying to do them with just one and get them both correct requires far too many bizarre hacks, which is perhaps why in times past there were so many broken implementations of "$@"). When field splitting isn't going to happen, "$@" makes no sense at all (that is deliberately requesting multiple words in a context where only one is permitted) and should probablty have simply been an error from day 1 - but wasn't, and it is far too late to change that now, so "$@" simply turns into "$*" in that context, but that isn't relevant for your test, which was just using $* When unquoted, $* and $@ produce the same results, which are just, when field splitting applies, the field split version of the quoted result. Now there are two plausible and reasonable ways to implement that (and there's the ksh way as well I guess). The first is to take the results from the quoted forms, and then field split each word resulting (which would be one word in the $* case, and $# words in the $@ case). The second is to note that the two ($* and $@) produce the same result (unquoted) and simply use the same algorithm for both - and since "$@" turns into "$*" when field splitting isn't going to happen), field splitting the result of "$*" is usually what happens with this implementation technique. Greg explained how bash implements $* which is the second of those ways (I didn't, as the actual implementation technique used by the shell shouldn't matter to a shell user, as the shell is free to alter that any time it likes) - even if that affects the results when what result will occur isn't specified. It makes a difference which technique is used when IFS[0] (the first character in "$IFS") is an IFS whitespace character, and to a lesser extent, when any IFS whitespace chars are in IFS. | No: very, very hard. The output is always correct except for | first-IFS-character-IS-but-not-WS. It is Yes, exactly. Think about the difference in how field splitting works for IFS whitespace, and other IFS characters. | Sorry if that was not clear. It was how I interpreted the original message, so no harm. | Yes, i think i cannot use the simple "create a string for | resplitting" approach to get there in a bash / NetBSD sh | compatible way, as we are identical with "a" You can, but you can't use that to handle "$@" and get it correct without adding hack on top of hack on top... (I know, the NetBSD sh tried to do it that way for many years). | Well i think it is relevant that : is in $IFS here: Indeed, it is. | POSIX says I know, I wrote that text, at least in essence. In previous versions it was all very sloppy ... and all the implementations you're testing predate the current POSIX version. | "a:" does not cause *two* fields to be generated, yet | that is exactly what happens with bash, NetBSD sh, and NetBSD ksh. | Unless i am mistaken. No?? But, i think that is what they do! No, just one. (Actually, I won't comment on NetBSD ksh, that thing is truly ancient, more or less unmaintained, and even though people actually use it, is full of bugs in corner cases all over the place). kre