Zepp Lu wrote:

$ printf '\x53\xef' | grep -aoP '\x53\xef'
(no output, returns 1)
$ printf '\x53\xc3\xaf' | grep -aoP '\x53\xef'
Sï
$ printf '\x53\xc3\xef' | grep -aoP '\x53\xef'
(no output, returns 1)

I don't see a bug here. PCRE patterns like \xef match code points, not bytes, so the PCRE notation differs from the shell printf notation. If your locale uses UTF-8, the PCRE pattern \xef matches the Unicode character U+00EF LATIN SMALL LETTER I WITH DIAERESIS, which is represented by the byte pair C3 AF.

If you want \xef to match a single byte, run grep in a single-byte locale, e.g., set LC_ALL=C in the environment.

grep (version 2.12-2) provided by Debian works just fine.

Actually, it's buggy in this area. Sometimes it can dump core.



Reply via email to