Date: Fri, 27 Sep 2024 12:25:49 +0200 From: tlaro...@kergis.com Message-ID: <zvailzgn-fmr0...@kergis.com>
| I have an algebraic mind: I always think of rule. A line, sometime | ago, was considered a sequence of bytes ending by the first appearance | of '\n'. If a "line" is defined more generally as a sequence of bytes | ending by the first appearance of whatever byte delimiter, But it isn't - what a line is is defined, and it isn't that. The delimiter is just what terminates the read, just as the byte count given to -n does. That might take a fractional line or many lines to achieve (given various combinations of -d and -n). | But could you state it clearly (not \`a la POSIX :-^) | in the man page? That would be my hope. But writing English was never one of my better achievements, as some of these e-mails should reveal. | Other corner case: when specifying a limit (-n) that is "end reading at the | first appearance of either eof, not escaped delimiter or that amount | of bytes read", what do you do when the last byte read (reaching the | count) is '\\'? Stop anyway. In general, every time it can occur, a stray ending \ just generates unspecified behaviour. In general I'd expect that using -n would normally mean -r as well, so the whole question is irrelevant, but for now, all that happens is that \ is read (no more, that would go beyond the limit) and having nothing to escape, is removed along with all the other \ chars that don't have any useful purpose (when -r is not given). | Or do you allow the stray backslash in the last | variable, convert it to the sequence "\\", or remove it? For now at least, the last (the first two would be essentially the same thing, as if that final \ was actually followed by another and the -n limit were one byte bigger). I think the only other reasonable approach to take would be to make it be an error, but I don't think that's warranted here. There will be, after all, no way to ever know it happened (in the script), without -r \ chars (except the escaped one, \\) are all removed anyway, as is IFS whitespace, etc - there's no immediate way to detect how much of each of those actually happened (with or without -n). [On -z] | IMHO, the reverse: That's my general preference as well, but it is a change to current behaviour, so I will wait upon others' opinions before making that happen (it is after all, one minor "!" operator addition, so mpt exactly something that is going to take hours of work). | Would it make sense to add a '-Z' option that translates a nul byte | into the sequence '\000' with the specification that such a sequence | is a constant one and is never interpreted, except by printf? No, I don't think so. I doubt there's any immediate need for that, and even in printf, what happens when that appears is unspecified (and for use with %b which would be where it ought to be used, if anywhere - not in the format string, which would mean allowing that to come from arbitrary external input, which is almost never a good idea, though not quite as bad in printf(1) as in printf(3)) it would need to be \0000 anyway, to meet the ancient stupid System III definition of how to write an octal constant for its echo program. kre