On 1/11/25 9:54 AM, MacBeth wrote:

Bash Version: 5.2
Patch Level: 37
Release Status: release

Description:

It seems that `read` truncates a trailing delimiter if there is one less
variable argument than input fields and the last field is empty (trailing
delimiter).

This is a combination of two rules: characters in IFS are field
terminators, not separators, and when `read' is supplied fewer
variables than fields,

"Any remaining input in the original field being processed shall be
returned to the read utility "unsplit"; that is, unmodified except that any
leading or trailing IFS white space, as defined in 2.6.5 Field Splitting
shall be removed."

Which basically means that the last variable gets everything else on
the line, including non-whitespace IFS characters.


See how output line 1 contains the trailing comma, but output line 3 does
not. Is this intended behaviour?

Yes, it's how all posix-style shells behave, and what POSIX has always
specified (the language in the standard has, well, `evolved' over time).

This is a simplified version how it works: after you remove leading and
trailing IFS whitespace, you read individual fields from the input using
the characters in IFS as field terminators. If you get to the last
variable and find a field terminator (in this case, the end of input
qualifies as a field terminator), you check whether there is additional
input. If there is, you just end the process there and assign whatever
input followed the previous delimiter you found to the last variable,
delimiters included. If not, the field is terminated and assigned to the
last variable.


Repeat-By:

Issue occurs on output line 3 here, but not on output line 1:

for row in "k,v1,v2," "k,v1,v2" "k,v1," "k,v1" "k,"; do
IFS=, read k v <<<"$row"
printf "%-20s %s\n" "row='$row'" "k='$k', v='$v'"
done
row='k,v1,v2,'       k='k', v='v1,v2,'

There are more fields than variables, so v gets the remaining input
after the delimiter following `k'.

row='k,v1,v2'        k='k', v='v1,v2'

Same here.

row='k,v1,'          k='k', v='v1'

The second comma terminates the second field, and there is no additional
input, so v gets the second field with the terminating delimiter removed.

row='k,v1'           k='k', v='v1'

This is what I meant about the end of input terminating a field.

row='k,'             k='k', v=''

The comma terminates the first field, which is assigned to `k' since
there is no additional input, and `read' then follows this POSIX rule:

"If there are still one or more unprocessed var operands, each of the
variables names by those operands shall be assigned an empty string."

since there are more variables than fields.

Issue occurs on output line 3 here, but not on output line 1:

This is the same example with the addition of an input field.


Issue occurs on output line 5 here, but not on output lines 1 or 3:

This is more-or-less identical as well.

If you want csv-style parsing, there is a loadable `csv' builtin in
the bash distribution that behaves like you want. Use it something like

while read
do
        csv "$REPLY"
        # manipulate the fields in the $CSV array variable
done


Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to