Date:        Fri, 12 Jul 2024 01:17:57 +0000
    From:        Emmanuel Dreyfus <m...@netbsd.org>
    Message-ID:  <zpcerabnwym4c...@homeworld.netbsd.org>

  | I just encoutered a patch(1) limitation when using it on minified json 
  | files from Wordpress. The lines can span more than the maximum of what
  | patch(1) can cope, which is INT16_MAX. Here is a test for taht:

While I don't object to the change (and core dumps are never good) what
you're doing is actually unspecified behaviour.

POSIX (latest, ie: the new - but aside from the section number, I think
this has been the same a long time) says (from XBD):

3.387 Text File

            A file that contains characters organized into zero or more lines.
            The lines do not contain NUL characters and none can exceed
            {LINE_MAX} bytes in length, including the <newline> character.
            Although POSIX.1-2024 does not distinguish between text files
            and binary files (see the ISO C standard), many utilities only
            produce predictable or meaningful output when operating on text
            files. The standard utilities that have such restrictions always
            specify ``text files'' in their STDIN or INPUT FILES sections.

LINE_MAX is typically around 1024.

The XCU specification for the patch utility says:

INPUT FILES
            Input files shall be text files.


I'd also note that if you're going to change things, "long" probably
isn't really big enough, that's just 32 bits (4GiB) on many ports,
and files, and hence "lines" in files, can be much larger than that,
so all you've really done is moved the goalpost.

Better would be to use size_t -- nothing (in memory) can be bigger than
that, by definition.   But you'd need to be aware of the switch from a
signed type to unsigned, which can affect how some code works.
Using ssize_t is also not really big enough.

kre

Reply via email to