On 1/11/25 9:54 AM, MacBeth wrote:
Bash Version: 5.2 Patch Level: 37 Release Status: releaseDescription: It seems that `read` truncates a trailing delimiter if there is one less variable argument than input fields and the last field is empty (trailing delimiter).
This is a combination of two rules: characters in IFS are field terminators, not separators, and when `read' is supplied fewer variables than fields, "Any remaining input in the original field being processed shall be returned to the read utility "unsplit"; that is, unmodified except that any leading or trailing IFS white space, as defined in 2.6.5 Field Splitting shall be removed." Which basically means that the last variable gets everything else on the line, including non-whitespace IFS characters.
See how output line 1 contains the trailing comma, but output line 3 does not. Is this intended behaviour?
Yes, it's how all posix-style shells behave, and what POSIX has always specified (the language in the standard has, well, `evolved' over time). This is a simplified version how it works: after you remove leading and trailing IFS whitespace, you read individual fields from the input using the characters in IFS as field terminators. If you get to the last variable and find a field terminator (in this case, the end of input qualifies as a field terminator), you check whether there is additional input. If there is, you just end the process there and assign whatever input followed the previous delimiter you found to the last variable, delimiters included. If not, the field is terminated and assigned to the last variable.
Repeat-By: Issue occurs on output line 3 here, but not on output line 1:for row in "k,v1,v2," "k,v1,v2" "k,v1," "k,v1" "k,"; do IFS=, read k v <<<"$row" printf "%-20s %s\n" "row='$row'" "k='$k', v='$v'" donerow='k,v1,v2,' k='k', v='v1,v2,'
There are more fields than variables, so v gets the remaining input after the delimiter following `k'.
row='k,v1,v2' k='k', v='v1,v2'
Same here.
row='k,v1,' k='k', v='v1'
The second comma terminates the second field, and there is no additional input, so v gets the second field with the terminating delimiter removed.
row='k,v1' k='k', v='v1'
This is what I meant about the end of input terminating a field.
row='k,' k='k', v=''
The comma terminates the first field, which is assigned to `k' since there is no additional input, and `read' then follows this POSIX rule: "If there are still one or more unprocessed var operands, each of the variables names by those operands shall be assigned an empty string." since there are more variables than fields.
Issue occurs on output line 3 here, but not on output line 1:
This is the same example with the addition of an input field.
Issue occurs on output line 5 here, but not on output lines 1 or 3:
This is more-or-less identical as well. If you want csv-style parsing, there is a loadable `csv' builtin in the bash distribution that behaves like you want. Use it something like while read do csv "$REPLY" # manipulate the fields in the $CSV array variable done Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/
OpenPGP_signature.asc
Description: OpenPGP digital signature