Hello Robert.

Robert Elz wrote in
 <25006.1741917...@jacaranda.noi.kre.to>:
 |    Date:        Fri, 14 Mar 2025 01:34:48 +0100
 |    From:        Steffen Nurpmeso <stef...@sdaoden.eu>
 |    Message-ID:  <20250314003448.DdqG5Ont@steffen%sdaoden.eu>
 |
 || I am deeply sorry, but i have one more difficulty that i fail to
 || explain to myself.
 |
 |As always, there are no rules for this for MUAs so you can
 |really do what you like, but:
 |
 || It *could* (small, very small case) be that
 || this time the incorrectness really is on the side of bash.
 |
 |It isn't.   bash is correct.

Thanks for answering, Robert; and spending so much time.
Let's see.

 ||   a() {
 ||     echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
 ||     echo $#,'*'="$*"/$*,
 |
 |When that echo command is run in the (uncommented) command
 |given the args to echo are (this can easily be seen if you
 |enable -x) are:
 |
 | echo '4,*=:a::a::a/' a '' a '' a,
 |
 |that is from bash, the NetBSD shell does:
 |
 | echo '4,*=:a::a::a/' a '' a '' a,
 |
 |which looks rather similar...
 |
 |Note, not all shells do it that way, mksh does
 |
 | echo '3,*=:a::a:a/' a '' a a,
 |
 |The spec for unquoted $* expansion allows for some variations.

Yes, that is clear; i compare against dash etc.  But i strive for
bash compat, ie, full resplitting.

 |To see why it is the way it is, we need to start with the command:
 |
 | a "$*"$* $*
 |
 |The "$*" generates ":a:" (that's simple enough), the 2nd $* generates 3
 |words, '' 'a' and '' so now we have
 | ":a:"'' a ''
 |then the third $* generates the same 3 words as the first, so we end
 |up with
 | ":a:"'' a '' '' a ''
 |
 |which is where things get variable.  The '' that's appended to ":a:"
 |adds nothing, that one essentially just gets merged and vanishes.

That.

 |This is where POSIX gets involved (in the description of the
 |special parameter '*' in XCU 2.5.2)
 |
 | When the expansion occurs in a context where field
 | splitting will be performed,
 |
 |which is here, the unquoted $* expansions
 |
 | any empty fields may be discarded and each of the non-empty ...
 |
 |(the remainder is irrelevant here).   "may be discarded", shells
 |are allowed to keep or discard them.

We drop, if possible: however, "if it separated something" we do
not join what was separated, but ensure these remain separate
arguments.  I think "we all agree" in that.

 |Both bash and the netbsd shell retain leading empty words from $*
 |expansions, but drop the last, try looking at what happens if you
 |do
 | set -- '' '' a '' '' b '' ''
 |
 |At that point $# is 8 (the 8 words explicitly set)
 |
 |Now do
 | set -- $*
 |and in bash (and NetBSD) $# is now 7, the final '' vanishes.

I cannot reproduce this, both NetBSD sh and bash say

  set -- '' '' a '' '' b '' '';echo $#,$*; set -- $*; echo $#,$*
  8, a b
  2,a b

 |Do that again, and $# is now 6, what was the 2nd last '' (but
 |is now last) vanishes.  After that, repeating the command changes
 |nothing, as there's no longer a trailing ''.

Hm.

 |In mksh it seems that leading and trailing empty words from $*
 |get dropped, but intermediate ones are retained.   So from when $#
 |starts as 8, its next step is to 6 args (the initial '' and final ''
 |are both dropped) and then to 4 (the same happens again).
 |
 |This output (from your message) shows what is happening
 |
 ||   4,1=:a:/ a ,2=a/a,3=/,4=a
 |
 |Showing that $1 is ":a:/ a " $2 is "a" $3 is "" and $4 is 'a'
 |
 |Given those values, the output you're not expecting is obvious
 |
 ||   4,*=:a::a::a/ a  a  a,
 |
 |There is a space at the end of $1, so when $2 is appended, two
 |spaces precede the 'a' (one from $1, and one inserted by echo
 |between the values of its args).   the two spaces after that
 |you're not concerned with I think, but that is $2 $3 $4 where $3
 |is '' so you end up with 'a' space nothing space 'a'
 |(where the "space" comes from echo).
 |
 |If you are getting the same output from the first echo line inside
 |a() as bash generates (you didn't show yours for that), then it is
 |hard to see how you could be generating anything different.  But

No: very, very hard.  The output is always correct except for
first-IFS-character-IS-but-not-WS.  It is

  @@ -1,3 +1,3 @@
   :a: a  a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a  a  a,$
  +4,*=:a::a::a/ a a  a,$

Sorry if that was not clear.

 |if your 1st echo output is different, as it is for mksh for example:
 | 3,1=:a:/ a ,2=a/a,3=a/a,4=
 |then anything is possible, this results in
 | 3,*=:a::a:a/ a  a a,
 |for the second line of output.  The space you're unsure of is
 |still there, but there aren't two between the final two 'a' chars,
 |as the empty extra arg (what was $3 for bash etc) isn't present.
 |
 || I find this logical since before the
 || resplit we have ":a:" + "a" + "" + "a", and the trailing ":" in
 || the first only delimits the field of "a",
 |
 |The ':' really has little to do with it, the space is a data

Yes, i think i cannot use the simple "create a string for
resplitting" approach to get there in a bash / NetBSD sh
compatible way, as we are identical with "a"

  #n-1000$ sh -c  'set -- "" "" a "" "" b "" "";
    IFS=:; echo $#,$* | cat -vet'
  8,  a   b $

  #kent$ bash -c  'set -- "" "" a "" "" b "" "";
    IFS=:; echo $#,$* | cat -vet'
  8,  a   b $

  #kent$ MXEXE -Squiet -Snoheader -:/ -R
  ? vpospar set '' '' a '' '' b '' ''
  ? se ifs=:; echo $#,$*; uns ifs
  8,  a   b
  ? set ifs=:; echo $#,$*,; uns ifs
  8,  a   b  ,
  ?

whereas we are not with ":a:"

  #n-1000 sh -c  'set -- "" "" :a: "" "" b "" "";
    IFS=:; echo $#,$* | cat -vet'
  8,   a    b $

  #kent $ bash -c  'set -- "" "" :a: "" "" b "" "";
    IFS=:; echo $#,$* | cat -vet'
  8,   a    b $

  ##..
  ? vpospar set '' '' :a: '' '' b '' ''
  ? set ifs=:; echo $#,$*,; uns ifs
  8,   a   b  ,

 |char, it isn't in IFS, so isn't going to be touched by field
 |splitting, and by this time, shell parsing is long done, so its
 |tokenisation (which drops unquoted whitespace normally) is no
 |longer relevant either.

Well i think it is relevant that : is in $IFS here:

  #n-1000$ sh -c  'set -- "" "" a "" "" b "" "";
    IFS=:; set -x; echo $#,$* | cat -vet'
  + echo 8, '' a '' '' b ''
  + cat -vet
  8,  a   b $

  #n-1000$ sh -c  'set -- "" "" :a: "" "" b "" "";
    IFS=:; set -x; echo $#,$* | cat -vet'
  + echo 8, '' '' a '' '' '' b ''
  + cat -vet
  8,   a    b $

Since the ":a:" is subject to further field splitting, and the ":"
is a delimiter which causes the field to be split / terminated
thus.  That is the essence of my problem, because the ":"
attached to "a" delimits the field of "a", and so the difference of
  set -- "" "" a "" "" b "" ""
and
  set -- "" "" :a: "" "" b "" ""
should be an additional empty field preceding "a", since the ":"
delimits "nothing", turning it into an empty field.  Whereas the
":" suffix only delimits the "a", yet

  $ bash -c  'set -- "" "" :a: "" "" b "" "";
    IFS=:; set -x; echo $#,$* | cat -vet'
  + echo 8, '' '' a '' '' '' b ''
  + cat -vet
  8,   a    b $
  $ bash -c  'set -- "" "" :a "" "" b "" "";
    IFS=:; set -x; echo $#,$* | cat -vet'
  + echo 8, '' '' a '' '' b ''
  + cat -vet
  8,   a   b $

as well as NetBSD sh (and NetBSD ksh)

  #n-1000$ sh -c  'set -- "" "" :a: "" "" b "" "";
    IFS=:; set -x; echo $#,$* | cat -vet'
  + echo+ cat -vet
   8, '' '' a '' '' '' b ''
  8,   a    b $
  #n-1000$ sh -c  'set -- "" "" :a "" "" b "" "";
    IFS=:; set -x; echo $#,$* | cat -vet'
  + + cat -vet
  echo 8, '' '' a '' '' b ''
  8,   a   b $

create an empty field!  POSIX says

  The shell shall use the byte sequences that form the characters
  in the value of the IFS variable as delimiters.

  Note that these delimiters terminate a field; they do not, of
  themselves, cause a new field

  when a field is said to be delimited, then the candidate field,
  as generated below shall become an output field

plus

  A byte sequence in the input which resulted from an expansion
  and which forms an IFS character that is not IFS white space:
  Remove that byte sequence from the input, but note it was
  observed

  At this point, if the candidate is not empty, or if a sequence of
  bytes representing an IFS character that is not IFS white space
  was seen at step 4, then a field is said to have been delimited,
  and the candidate shall become an output field.

Therefore: "new field" bla bla bla, "you see the 'a'", you step
and see the "not IFS white space" colon ':', you delimit the field
of the "a".  "a:" does not cause *two* fields to be generated, yet
that is exactly what happens with bash, NetBSD sh, and NetBSD ksh.
Unless i am mistaken.  No??  But, i think that is what they do!

That is the question, and i cannot explain this to myself when
i compare it with the standard wording.

 |If you ever needed a justification to never use unquoted $* (or $@ which
 |is the same thing) when any of the numeric params can possibly be
 |empty strings, then this is it.   Just don't do that.

Well i do not, normally.  It is only formally, so to say, and it
also rhymes.  (In fact i was considering to undo all the field
splitting which i, in the released version, *explicitly* do not
support, and only enable it via a "command modifier" (we have
trigger words like the shell's "eval", you know), but now i only
turn it off on a command-base, like in bash's [[ ]] construct.)

Dear Robert, thanks for the effort, but i am still lost. ;-)
Lots of greetings!!

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Reply via email to