I tried to backport http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4 ; this by itself does not apply as this changes the files lib/hard-locale.[ch] (which aren't present at all in grep 2.24) and also has a lot of changes to files that aren't contained in grep. I got a backport now, but this still does not fix the bug.
I spent some half an hour making this work, but the longer I do this the less I have faith in the result. Therefore my recommendation is to drop this hackery-patchery and just upgrade xenial to grep 2.25 instead. The complete changelog is: ** Bug fixes In the C or POSIX locale, grep now treats all bytes as valid characters even if the C runtime library says otherwise. The revised behavior is more compatible with the original intent of POSIX, and the next release of POSIX will likely make this official. [bug introduced in grep-2.23] grep -Pz no longer mistakenly diagnoses patterns like [^a] that use negated character classes. [bug introduced in grep-2.24] grep -oz now uses null bytes, not newlines, to terminate output lines. [bug introduced in grep-2.5] ** Improvements grep now outputs details more consistently when reporting a write error. E.g., "grep: write error: No space left on device" rather than just "grep: write error". (the first item is the fix for this bug). The other bug fixes are desirable for xenial as well, and the improvement seems harmless (and nice) enough to include it too. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to grep in Ubuntu. https://bugs.launchpad.net/bugs/1547466 Title: grep switches into binary mode while processing a text file Status in grep package in Ubuntu: Fix Released Status in grep source package in Xenial: In Progress Status in grep source package in Yakkety: Fix Released Status in grep package in Debian: Unknown Bug description: I noticed this staring to happen in Xenial about two days ago. When running sbuild (or now the buildd, too), the build breaks when trying to compile a generated file. I traced the problem down to grep suddenly acting weird. When not having any language set (or a non-UTF8 mode) it will start printing some lines of a source file and then suddenly end that by printing "Binary file ... matches". With the attached file, the difference can be observed (running Xenial): LANG=C grep -v xxx grant_table.h and LANG=C.UTF-8 grep -v xxx grant_table.h SRU INFORMATION =============== Upstream fixes: - http://git.savannah.gnu.org/cgit/grep.git/commit/?id=d8a366218 (but depends on previous patches and is not sufficient by itself) - http://git.savannah.gnu.org/cgit/grep.git/commit/?id=d8a366218 (tests+doc) Test case: Call grep on a file or a string with non-ASCII characters in the C locale: $ echo 'héll☺ ≥x' | LC_ALL=C grep . In xenial this just shows "Binary file (standard input) matches", with the fix it should show the actual input string (with some garbled output of course as the UTF-8 chars cannot be displayed in C) Regression potential: grep is being used in tons of places; during xenial we had to fix/put a "use grep -a" workaround into a lot of packages to fix the fallout from grep 2.23 which introduced this. That said, as a result of "Binary file matches" does not give any more information than the actual string match, and scripts which get along with this answer most likely just check the exit code anyway (which does not change), the risk is bearable. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1547466/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp