On Sun, Apr 20, 2025 at 17:31:56 -0400, Chet Ramey wrote: > On 4/20/25 3:08 AM, Stephane Chazelas wrote: > > > $ printf '%b\0' winter 'spring\0315' summer automn | > > bash -c 'while IFS= read -rd "" season; do printf "<%q>\n" "$season"; > > done' > > <winter> > > <$'spring\315'> > > <automn> > > > > skipping summer, or maybe worse: > > > > $ printf '%b\n' winter 'spring\0315' summer automn | > > bash -c 'while IFS= read -r season; do printf "<%q>\n" "$season"; done' > > <winter> > > <$'spring\315\nsummer'> > > <automn> > > > > bundling spring with summer (all with bash-5.2 on Debian for instance) > > This has been fixed since last July, and the fix is in bash-5.3. The bug > concerns unicode combining characters introducing invalid unicode character > sequences that happen to contain the delimiter, and was reported privately.
That one may be fixed, but: bash-5.3$ printf 'FOO\0\315\0\226\0' | while IFS= read -rd '' f; do printf '<%q>\n' "$f"; done <FOO> <$'\315'> <''> <''> The context for all of this was someone in IRC who was reading a chunk of data from /dev/urandom and got different results with LC_CTYPE=C vs. LC_CTYPE=en_US.utf8 (or other UTF-8 locale). This is a simplified reproducer. In real-life scripts, this kind of thing could arise if someone reads a NUL-delimited stream of pathnames from find -print0, or equivalent. Since nobody seems to have reported it officially yet, I'm going to add a Cc: bug-bash on this one.