Simon wrote:
Sorry my description was slightly ambiguous. I should not have said
skip so much as treats the file as binary and does not find a match
because each character takes 2 octets as per utf-8.
$ mkdir tmp
$ cd tmp
$
$ printf
'\377\376\164\000\145\000\163\000\164\000\061\000\015\000\012\
Simon wrote:
Windows text files can start with a byte order mark of U+FEFF and then
be encoded in UTF-8. These are skipped as being binary files.
I can't reproduce this problem on Fedora 26 x86-64. Here's how I tried:
$ printf '\357\273\277x\n' >t
$ LC_ALL=C grep x t | od -c
000 357 273 2
Windows text files can start with a byte order mark of U+FEFF and then
be encoded in UTF-8. These are skipped as being binary files.