On Mon, Jan 9, 2023 at 10:16 PM Paul Eggert <egg...@cs.ucla.edu> wrote: > Here's a shell session illustrating the problem on Fedora 37, which has > GNU grep 3.7. The same bug is still in bleeding-edge GNU grep. > > $ export LC_ALL=en_US.utf8 > $ printf '\300\n' | grep '\b' > grep: (standard input): binary file matches > $ printf '\300\n' | grep -P '\b' > $ > > Plain grep finds a word boundary in the input even though the input > contains no words (just an encoding error). 'grep -P' does the right thing. > > The underlying issue is in the glibc regex code so the fix should be in > glibc / Gnulib, but I thought I'd report it here before I forgot it.
Thanks! While this would definitely be nice to fix before the release (in the next week or so), it's enough of a corner case that I wouldn't feel bad releasing without a fix. For the record, this problem first arose in grep-2.19.