Date: Tue, 24 Sep 2024 17:30:42 +0200 From: tlaro...@kergis.com Message-ID: <zvlbitytlanig...@kergis.com>
| Furthermore the continuation test on: | | if (c != '\n') /* \ \n is always just removed */ | goto wdch; | | seems wrong. Shouldn't it be?: | | if (c != end) | goto wdch; Actually no, what is there now is what is intended. The idea is that the input might need to be divided into many lines to meet the requirement that it be a text file, which means a max line length (as you're aware), and that max length is from the first char in the line to the next \n char (read's delimiter char has nothing to do with that use of \n). To allow that, while not restricting the length of a record, the sequence \ \n is allowed to indicate continuation lines, regardless of what the delimiter is, and is simply removed from the input stream (just as in cpp and sh - and more). Other than that usage, a \ also escapes the following char, avoids it being anything special (not a field (word) separator, not the delimiter, and of course, as \\ not the escape char either). If the delimiter was \n (the default, or -d $'\n') then the end of line continuation removal causes it to vanish before the code checks if the delimiter has appeared, if the delimiter is something else, we don't want it to vanish, there is no point in that -- say we use "-d :", why would we then ever write \: in the input if those pair of chars are simply deleted? Makes no sense. What we would want is the escaped : there to be a regular char, not deleted, and not the delimiter either. So the test above is is checking for when we have a \ before some character other than \n - in which case the goto adds the following character to the current word (which makes it into just a ordinary char, not special in any way, with the preceding \ removed). But if it is \n after the \ we don't do that, so just continue (next line not shown above) which goes back to read more input, simply discarding the \ \n sequence, which is what we want to happen whether \n is the delimiter or not. This is specifically allowed by posix in the spec of the read command, though you have to read the almost indecipherable sentence about a million times, and already knowing what it is trying to say, to understand it (and even then I think what it is saying has an error, but it is so hard to decipher I'm not sure). Apart from that: I think I have -n implemented as intended (by me anyway) now. But now I need to also update the manual ... I started trying to fit it into the text in the form the description of the read builtin currently exists, but that got ridiculously messy, so I am going to discard the whole current destription and do it again in the more conventional form, with the options listed as a list, rather than just worked into the description in narative form. That's going to take another day or so. I have also added -z (currently, for not very important backward compat with the current impl) to issue an error if a \0 is encountered in the input (other than as the record delimiter). Inverting the sense of that option probably makes more sense (-z to allow \0 chars, and error without that option). Either way this is very very simple and cheap to implement, as the code has to check for the \0 chars anyway. (The error would cause the read to terminate with exit status 2, as does any other error). Or that option could just go away again. Opinions please? (everyone) kre