On Tue, Sep 24, 2024 at 09:09:29AM -0400, Greg Troxel wrote: > Robert Elz <k...@munnari.oz.au> writes: > > > Date: Tue, 24 Sep 2024 12:56:49 +0200 > > From: <tlaro...@kergis.com> > > Message-ID: <zvka8e8a7fhif...@kergis.com> > > > > | The present patch does two things: > > | > > | 1) Set, by default, the maximum of bytes read, in every case, as being > > | LINE_MAX (the maximum number of bytes in a line in a text file); > > > > I am not really in favour of that part, while allowed by the standard, > > imposing unnecessary limits, just because they are permitted, is not > > really ideal. Apart from that, the "line" read by read (without -r) > > can actually be several (or many) text file lines, if each is ended by > > a \ (line continuation). > > Sure, but the problem is that if you have a file which is e.g one line > (single \n at end) that is 10 MB, read from it is unreasonable, and it's > difficult to deal with this in portable code. > > If there were a limit which was well under 1 MB, but well over anything > reasonably in a bona fide text file, it would finesse the issue. > > Perhaps 32 * LINE_MAX.
POSIX issue 8 has added the "-d delim", that is a delimiter of a "line" and this makes things more complex, since the continuation is the escaping of the delimiter. My solution was too simple. We have to make a difference between the maximal length of a "line" (linemax), and the maximum of bytes to read (the "-n" option): recordmax. If the delimiter is the newline, the maximal length of each "line" is a text line, that is LINE_MAX; if the delimiter is something else, the maximum is ULONG_MAX. If this amount is reached without reaching the delimiter (escaped or not), the reading stops. When changing line (after a continuation line), the counter is reset to zero allowing to absord another "line". What is set by "-n" is the maximum count of bytes composing the record (recordmax), that may be a concatenation of "lines", not counting the discarded bytes (backslash and delimiter that are not part of data since the "escaped line" is presentation, to be discarded) and counting only 1 for an escaped sequence if it is interpreted (not raw) (replacing the escaped sequence by the character). If the maximum is not set it defaults to ULONG_MAX. Slightly more complex than what I made, but still reasonably simple. -- Thierry Laronde <tlaronde +AT+ kergis +dot+ com> http://www.kergis.com/ http://kertex.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C