On 02/16/2017 11:26 AM, Eli Zaretskii wrote: >> I'm of the opinion that undossify_input causes more problems than it >> solves. We should trust fopen("r") to do the right thing, rather than >> reinventing it ourselves. > > FYI: You'd be losing an important feature for non-Cygwin DOS/Windows > users if you remove undossify_input and decide to trust fopen's "r" > (or "rt") mode.
"rt" mode is not required to exist. And I don't know any modern implementation of "r" mode on a system with non-zero O_BINARY that eats ALL \r - both Cygwin and mingw just change \r\n into \n while still preserving other \r. The undossify() code that Paul just removed did NOT behave the same as text mode (in that it did, perhaps unintentionally, eat ALL \r). I count it worse to TRY and reimplement the OS "r" mode and get the implementation wrong, with more lines of code, than to just trust the OS to do it correctly in the first place. The undossify() code may do the right thing on text files, but is absolutely wrong on binary files. My alternative patch was less intrusive, and at least preserves the undossify behavior on text files, but with the requirement that fcntl(0, F_GETFL) & O_TEXT is a reliable way to tell if an fd is currently opened in text mode before slamming it over to the binary mode required for undossify to work. > That's because reading a file which was opened in > text-mode generally removes _all_ CR characters, even if they are not > followed by a newline; it also stops on the first ^Z character in the > file, treating it as a kind of "software EOF", a legacy from CP/M > years. No, I don't know of any fopen(,"r") code that eats _all_ CR. Certainly cygwin's implementation does not do that. > > That's why the original patch switched the file descriptor to binary > mode (Grep used 'open', not 'fopen', in those days) and used > undossify_input: that allowed Grep to DTRT with these use cases, > removing CRs only if they are followed by a newline, and not stopping > at ^Z. As a side effect, undossify_input also collects the > information needed for displaying byte offsets. Yes, you do make a point that the side effect of reimplementing text mode ourselves on a forced binary fd lets us "count" byte offsets where the count could be text while the scan was binary, or where the count could be binary while the scan was text. But in reality, are there any users that ever want a mixed-mode count? If you are scanning in binary, you want the binary count; if you are scanning in text, you want the text count. And in those two scenarios, with a sane text-mode from the OS, still give the correct counts with a lot less code in grep. > > It seems to me that when one bumps into some code which looks > incorrect or less than optimal, and one considers its replacement with > a more clever code, it would be a good idea to ask the person(s) who > contributed the original code, in case there was some good reason for > doing it that way. Was that done in this case? If not, it should > have been. > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature