Paul Eggert wrote:
Linda Walsh wrote:
I had one file that it bailed on
saying it has an invalid UTF-8 encoding -- but the line was
recursive starting from '.' -- and it didn't name the file
----
I didn't report that as 'a bug', because when I went back to reproduce
it -- low level physics took over -- i.e. the closer I looked, the more
uncertain the problem became! I did change the grep * into a for i in
*;do echo file;grep file;...but couldn't find the file that gave the
message...Grrr. I will bet it was with the '-P' option, since the
standard Regex in perl complains about such things and since I was only
interested in status (was using -q _because_ I was searching for a
binary pattern -- the '\000\000') I got the warning but nothing else.
If I run into it again, maybe I can find it w/o looking too closely
then that uncertainty principle won't kick in... ;-)
That's pretty vague. Can you reproduce that problem? I don't observe
it:
$ mkdir d
$ printf 'a\200\n' >d/f
$ printf 'b\200\n' >d/g
$ grep -r a d
Binary file d/f matches
"-a" doesn't work, BTW:
Ishtar:/tmp> grep -a '\000\000' zeros
Ishtar:/tmp> echo $?
1
That's the way 'grep' has always behaved. The regular expression '\0'
matches the string "0", not the NUL byte.
Ishtar:/tmp> grep -P '\000\000' zeros Binary file zeros matches
I don't follow this example; perhaps some text was omitted? Anyway,
-P has always treated files containing zeros as binary files too, ever
since -P has been introduced. It's the same as without -P.
But there it is -- if grep wasn't meant to handle binary files,
it wouldn't know to call 'zeroes' a binary file.
Obviously, grep *is* meant to handle binary files; it's documented to
handle them in a particular way.
---
Nevertheless, it is documented, that '\ddd' or '\xHH' can be used
to match a single character of the value specified. '\000\000' is
found in 'zeroes' (as mentioned in the original report -- a file
filled with 4k of nulls), with the -P switch, but not the -a switch.
That behavior violates the documentation.
how can 'shuf' claim to work on input lines yet have this allowed:
-z, --zero-terminated
line delimiter is NUL, not newline.
I don't follow this point. -z is a nice feature; we don't want to get
rid of it.
----
Nice of you to not read the previous notes. The argument was that
a NUL in a file made it non-text -- therefore it woudln't be a "line".
People argue to dumb down POSIX
utils, because some corp wants to get a posix label but
has a few shortcomings -- so they donate enough money and
posix changes it's rules.
I'm afraid you've gone off the deep end here.
I didn't bring up POSIX, Eric did. Again, nice of you to jump
in the middle of a conversation and not read the earlier notes...
:-)
*Cheers* Paul...(et al).
-linda