2019-09-30 15:35:21 -0400, Chet Ramey: [...] > The $'\361' is a unicode combining > character, which ends up making the entire sequence of characters an > invalid wide character string in a bunch of different locales. [...]
No, $'\u0361', the unicode character 0x361 (hex) is "COMBINING DOUBLE INVERTED BREVE" (encoded as \315\241 in UTF-8) But $'\361' is byte value 0361 (octal). In UTF-8, on its own it's an invalid byte sequence. That's 2#11110001, which would be the first byte of a 4 byte-long character (of characters U+40000 to U+7FFFF). In latin1, that's ñ (LATIN SMALL LETTER N WITH TILDE). So $'foo\361bar' is not text in UTF-8, but that's an encoding issue, not a problem with combining characters. $ locale charmap UTF-8 $ printf '\u361' | od -An -to1 315 241 $ printf '\U40000' | od -An -vto1 361 200 200 200 $ printf 'foo\361bar' | iconv -f utf8 fooiconv: illegal input sequence at position 3 -- Stephane