On Sun, Apr 20, 2025 at 17:31:56 -0400, Chet Ramey wrote:
> On 4/20/25 3:08 AM, Stephane Chazelas wrote:
> 
> > $ printf '%b\0' winter 'spring\0315' summer automn |
> >     bash -c 'while IFS= read -rd "" season; do printf "<%q>\n" "$season"; 
> > done'
> > <winter>
> > <$'spring\315'>
> > <automn>
> > 
> > skipping summer, or maybe worse:
> > 
> > $ printf '%b\n' winter 'spring\0315' summer automn |
> >    bash -c 'while IFS= read -r season; do printf "<%q>\n" "$season"; done'
> > <winter>
> > <$'spring\315\nsummer'>
> > <automn>
> > 
> > bundling spring with summer (all with bash-5.2 on Debian for instance)
> 
> This has been fixed since last July, and the fix is in bash-5.3. The bug
> concerns unicode combining characters introducing invalid unicode character
> sequences that happen to contain the delimiter, and was reported privately.

That one may be fixed, but:

bash-5.3$ printf 'FOO\0\315\0\226\0' | while IFS= read -rd '' f; do printf 
'<%q>\n' "$f"; done
<FOO>
<$'\315'>
<''>
<''>

The context for all of this was someone in IRC who was reading a chunk
of data from /dev/urandom and got different results with LC_CTYPE=C vs.
LC_CTYPE=en_US.utf8 (or other UTF-8 locale).  This is a simplified
reproducer.

In real-life scripts, this kind of thing could arise if someone reads
a NUL-delimited stream of pathnames from find -print0, or equivalent.

Since nobody seems to have reported it officially yet, I'm going to
add a Cc: bug-bash on this one.

Reply via email to