bug#30326: grep not searching through a text file (thinking it binary)

2018-04-20 Thread Paul Eggert
On 02/05/2018 03:38 PM, Paul Eggert wrote: I was referring to text containing encoding errors without containing NULs, which is what this bug report originally was about. Sorry I didn't make that clear. Following up on this (with some delay...), I installed the attached patch to try to cover

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-05 Thread Paul Jackson
Paul Eggert wrote: >> I was referring to text containing encoding errors without >> containing NULs Ah - that makes sense. The following experiment leads me to conclude that grep entirely suppressesemitting any portion of a match that would contain an encoding error, ratherthan emitting some su

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-05 Thread Paul Eggert
On 02/05/2018 01:27 PM, Paul Jackson wrote: I created a large file ("/tmp/pjbb")  by concatenating: 1) a big plain ASCII file of C source code, 2) a small ELF executable, and 3) another big plain ASCII file of C source code. Then I grep'd in this big file for the string "p...@usa.net

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-05 Thread Paul Jackson
Paul Eggert wrote, in response to my suggestion to filter grep output, not input, for "binary junk":>> We've done that already, if memory serves. I don't think so :). The installed grep on the system I'm typing on right now is "grep (GNU grep) 3.0".I've not checked closely, but I believe that sh

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-05 Thread Paul Eggert
On 02/05/2018 08:05 AM, Paul Jackson wrote: If one goal of the current grep behavior is to avoid putting out "junk" unexpectedly, then instead of rejecting input files that have any such "junk", rather happily grep on any dang file, by default, but then filter the output to suppress the "junk".

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-05 Thread Paul Jackson
A couple of possible "solutions" to this quandary: === If one goal of the current grep behavior is to avoid putting out "junk" unexpectedly, then instead of rejecting input files that have any such "junk", rather happily grep on any dang file, by default, but then filter the output to suppress

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-04 Thread Paul Eggert
L A Walsh wrote: I didn't care Some users do care: they don't want grep to output binary junk that may mess up their screen. Problem is on a mailbox, different emails can have different encodings. There's no general solution to that problem. No matter what grep does, it will mishandle s

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread L A Walsh
Paul Eggert wrote: On 02/02/2018 03:30 PM, L A Walsh wrote: > most computer files (vs. user-files) are still single-byte. That's because so many of them are ASCII. But ASCII files are not the issue here. grep's behavior hasn't changed when operating on ASCII files in typical locales. The i

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread Paul Eggert
On 02/02/2018 03:30 PM, L A Walsh wrote: most computer files (vs. user-files) are still single-byte. That's because so many of them are ASCII. But ASCII files are not the issue here. grep's behavior hasn't changed when operating on ASCII files in typical locales. The issue is text using a non

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread L A Walsh
Paul Eggert wrote: On 02/02/2018 03:16 PM, L A Walsh wrote: It also used to be the default. Single-byte locales also used to be the default. Times have changed, and things have gotten more complicated. We don't change default behavior for no reason, but we also don't keep the defau

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread Paul Eggert
On 02/02/2018 03:16 PM, L A Walsh wrote: It also used to be the default. Single-byte locales also used to be the default. Times have changed, and things have gotten more complicated. We don't change default behavior for no reason, but we also don't keep the default the same even when the wor

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread L A Walsh
Paul Eggert wrote: On 02/02/2018 12:09 PM, L A Walsh wrote: Grep was able to find text strings in mboxes without a POSIX definition telling it that it was "broken". It's not a question of POSIX telling us what to do. It's a question of what is a good thing for GNU grep to do, and ma

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread Paul Eggert
On 02/02/2018 12:09 PM, L A Walsh wrote: Grep was able to find text strings in mboxes without a POSIX definition telling it that it was "broken". It's not a question of POSIX telling us what to do. It's a question of what is a good thing for GNU grep to do, and making sure that this behavior

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread L A Walsh
Grep was around long before POSIX, as were most of the unix utils. Grep was able to find text strings in mboxes without a POSIX definition telling it that it was "broken". I don't want it displaying random binary that throws my terminal into weird modes, which is why I skip binary files. To ha

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread Eric Blake
tag 30326 notabug thanks On 02/02/2018 01:30 PM, L. A. Walsh wrote: > I've used grep to search through my mbox-format emails for decades, but > I've run into a case where it seems to be ignore a text mailbox > because, I guess, it thinks it is "binary" Yes, that's correct. > If I used "-Par" it

bug#30326: grep not searching through a text file (thinking it binary)

2018-02-02 Thread L. A. Walsh
I've used grep to search through my mbox-format emails for decades, but I've run into a case where it seems to be ignore a text mailbox because, I guess, it thinks it is "binary" (I think ignoring binary is a default in my aliases file). I used: grep -Pr 'Game:\s+NCSOFT' * and it ignored a m