On Sun, May 10, 2020 at 10:00 AM Stephane Chazelas <steph...@chazelas.org> wrote: > > 2020-05-01 19:05:28 +0200, radisso...@web.de: > [...] > > problem: grep for a character where only the hexcode in known. > > > > solution: use $'\xNN' > > then shell expands this to the required code > > > > example: printf "A\nB\nC\n" | grep $'\x41' > [...] > > The $'\x41' ksh93 quoting operator expands to *byte* values. > > To get a character based on the Unicode codepoint value, you'd > need the $'\u41' zsh operator (or $'\U10000' for code points > above 0xffff). > > But in any case, that is done by the shell, that has nothing to > do with grep and the syntax of those shell operators varies > between shells. > > In the fish shell you'd use: > > grep \u41 > > or > > grep \x41 > > instead. > > Also, since it's done by the shell, things like: > > grep $'\u2e' > > where U+002E is "FULL STOP", would not only match on "." > characters but on any character. All grep sees is a "." > character. That would be different from grep -P '\x2e' which > matches "." (U+002E) only. > > Note that: > > grep -P '\xE9' > > matches on the byte 0xE9 in singlebyte locales (regardless of > what character that byte represents in the locale's charset) and > on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence > of bytes, not byte 0xe9).
Thank you for the thorough reply, Stephane! Bearing that in mind, Radisson, please consider submitting a revised patch. I suggest to recommend something like this: $ printf '%s\n' A B C| LC_ALL=C grep -P '\x41' A so that the example is independent of both the current locale and the shell.