On Wed, Sep 25, 2024 at 12:18:28AM +0700, Robert Elz wrote: > > It isn't possible. Actually using \ as the delimiter (without > -r anyway) makes little sense at all, but that doesn't mean it > needs to be prohibited.
Then, this is another thing that has to be corrected in POSIX, issue 8: ---8<--- If the -r option is not specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of a following <backslash> and shall prevent a following byte (if any) from being used to split fields, with the exception of either <newline> or the logical line delimiter specified with the -d delim option (if it is used and delim is not <newline>); it is unspecified which. If this excepted character follows the <backslash>, the read --->8--- And this escape business is simply non parsable with a backslash as a delimiter. I suggest in our code to explicitely (for readability) set: if (end == '\\') rflag = 1; /* no escaping if escape */ this will help a casual reader and seems, IMHO, more easy to grasp when reading than (c == '\\' && c != end) --- that indeed discard end == '\\'. Too smart at least for me ;-) What I can't once more parse in the POSIX specification is if it shall be interpreted as "the sequence backslash and delimiter in not raw mode is a continuation line", or if "in not raw mode, any escaped delimiter is a continuation line as well as the escaped newline". For me, the "either newline or other" has to be interpreted as xor, but am I right? But in this case this covers the whole range "newline and not newline", so why not simply state that the line delimiter escaped when not in raw mode is a continuation line (having stated once and for all that a backslah as delimiter implies raw mode)? And they should start by stating that the input is a sequence of lines, considered as a sequence of bytes ending by the first appearance of a delimiter byte that is the newline by default but that can be set to any byte with the -d option. That a record can span multiple lines if there are continuation lines that is. if not in raw mode, when the end delimiter is escaped. And that read reads one record, discarding continuation lines and replacing escaped sequences (when not in raw mode), and then splitting the record according to the following rules. -- Thierry Laronde <tlaronde +AT+ kergis +dot+ com> http://www.kergis.com/ http://kertex.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C