Paul Eggert wrote: >> I was referring to text containing encoding errors without >> containing NULs Ah - that makes sense.
The following experiment leads me to conclude that grep entirely suppressesemitting any portion of a match that would contain an encoding error, ratherthan emitting some substring of the match that can be correctly encoded. That is, it seems that if grep is asked to emit what it thinks would be amatch with an encoding error, grep seems to suppress that output line entirely, and continues looking for matches that it can emit without encodingerrors, and then at the end, if it saw a match that would have emitted anencoding error, it issues the "*Binary file ... matches*" error, just before exiting (or ending processing of that particular file.) I demonstrated this by replacing the ELF executable of my previous example withthe output of the following C program, which issues every possible pair of bytes,except for no nul and no 255 bytes: *main()** * *{** * * int i, j;** * * for (i = 1; i < 255; i++) {** * * for (j = 1; j < 255; j++)** * * printf("%c%c", i, j);** * * }** * * puts("");** * *}** * So I tested on a file (*/tmp/pjcc*) containing (1) a bunch of ASCII C code,(2) output from the above program, and (3) another copy of the same ASCII C code. Then, with the following settings: *LC_COLLATE=C** * *LANGUAGE=en_US.UTF-8** * *LC_ALL=en_US.UTF-8** * *LANG=en_US.UTF-8** * I ran the command: *grep "'N'" /tmp/pjcc** * I got the following output: * case 'N':** * * case 'N':** * *Binary file /tmp/pjcc matches** * The "*case 'N':*" string appears once in the C code used in the file, butthere are two copies of that C code in the file, so that grep prints that line twice. I also double checked that my file */tmp/pjcc* did not contain any nul bytes. The three character sequence *'N'* also appears in the middle section ofall non-nul, non-255 pairs of bytes, as well as in the ASCII C code, andit was (I presume) the match on that section of the file that caused grepto issue the ""*Binary file /tmp/pjcc matches* complaint at the end of its processing of that file. If on the other hand, I ran the command: *grep "'N':" /tmp/pjcc* then I got the output: * case 'N':* * case 'N':* with*_out_* any complaint that the *Binary file /tmp/pjcc matches.* The four character sequence *'N':* appears (twice) in the C code, but zero times in the middle section of all non-nul, non-255 pairs of bytes. >From this I conclude that if grep, in its default mode, is asked to emit a matchingpattern that would contain encoding errors, that it does not trim the output to whatwould encode correctly and continue onward, but rather emits nothing for that match,continues onward looking for more matches that it can emit correctly, and thenprints the "*Binary file ... matches*" error just before it exits or goes to thenext file. If I were designing grep from scratch, and had infinite resources, I might refer tohave grep emit some substring of each match that it can encode correctly, ratherthan emit nothing in case of an encoding error. However, I can't imagine that this is worth the effort, and (being a stickin the mud old fart) I usually recommend against incompatible changes unless strongly necessary. So ... whatever ... nevermind ... as they say. -- Paul Jackson p...@usa.net