Re: Some byte combinations affect UTF-8 string reading

2019-02-26 Thread Chet Ramey
On 2/25/19 5:42 PM, Olga Ustuzhanina wrote: > On Mon, 25 Feb 2019 12:59:38 -0800 > L A Walsh wrote: > >> In this case, the decode of \xc2 doesn't swallow the following >> character. > > I want to clarify that \xc2 (and other characters in the range > mentioned above) can only swallow a \0. Other

Re: Some byte combinations affect UTF-8 string reading

2019-02-25 Thread Grisha Levit
On Mon, Feb 25, 2019 at 4:01 PM L A Walsh wrote: > ntc() { while IFS='' read -r input; do printf "$input;" ; done ; } > printf $'\xc2\00\00\00\00'|ntc|hexdump -C > > both result in no output. a) If you actually want the null bytes to be piped, the \0 has to be interpreted by printf. Using $'' q

Re: Some byte combinations affect UTF-8 string reading

2019-02-25 Thread Olga Ustuzhanina
On Mon, 25 Feb 2019 12:59:38 -0800 L A Walsh wrote: > In this case, the decode of \xc2 doesn't swallow the following > character. I want to clarify that \xc2 (and other characters in the range mentioned above) can only swallow a \0. Other characters are unaffected. > > But in 4.4.12, using IFS

Re: Some byte combinations affect UTF-8 string reading

2019-02-25 Thread L A Walsh
On 2/25/2019 11:32 AM, Chet Ramey wrote: > On 2/25/19 11:17 AM, Olga Ustuzhanina wrote: > > > > This is an invalid multibyte character. The \xc2 is the valid first byte > of a multibyte character, but the next byte read makes the sequence > invalid. The read builtin resynchronizes on the followi

Re: Some byte combinations affect UTF-8 string reading

2019-02-25 Thread Chet Ramey
On 2/25/19 11:17 AM, Olga Ustuzhanina wrote: > Bash Version: 5.0 > Patch Level: 2 > Release Status: release > > Description: > When using `IFS= read -r -d '' input` to read null-delimited > strings on a system with bash 5.0+ and UTF-8 locale, you can > encounter situation when

Some byte combinations affect UTF-8 string reading

2019-02-25 Thread Olga Ustuzhanina
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: cc Compilation CFLAGS: -fstack-clash-protection -D_FORTIFY_SOURCE=2 -mtune=generic -O2 -pipe -DSYS_BASHRC='/etc/bash/bashrc' -g -Wno-parentheses -Wno-format-security uname output: Linux lase