> From: Eric Blake <ebl...@redhat.com> > Date: Mon, 13 Feb 2017 14:20:56 -0600 > > > While we're on the topic, the undossify_input approach is just a > > heuristic and it sometimes guesses wrong. I wish the heuristic could be > > removed somehow, so that grep would behave more deterministically on > > MS-DOS/Windows. > > > > I'm of the opinion that undossify_input causes more problems than it > solves. We should trust fopen("r") to do the right thing, rather than > reinventing it ourselves.
FYI: You'd be losing an important feature for non-Cygwin DOS/Windows users if you remove undossify_input and decide to trust fopen's "r" (or "rt") mode. That's because reading a file which was opened in text-mode generally removes _all_ CR characters, even if they are not followed by a newline; it also stops on the first ^Z character in the file, treating it as a kind of "software EOF", a legacy from CP/M years. That's why the original patch switched the file descriptor to binary mode (Grep used 'open', not 'fopen', in those days) and used undossify_input: that allowed Grep to DTRT with these use cases, removing CRs only if they are followed by a newline, and not stopping at ^Z. As a side effect, undossify_input also collects the information needed for displaying byte offsets. It seems to me that when one bumps into some code which looks incorrect or less than optimal, and one considers its replacement with a more clever code, it would be a good idea to ask the person(s) who contributed the original code, in case there was some good reason for doing it that way. Was that done in this case? If not, it should have been.