On 6/24/17 1:41 PM, Eduardo A. Bustamante López wrote:
> I was looking through this old thread:
> http://seclists.org/oss-sec/2014/q3/851
> 
> It looks like the issue reported in there is still there:
> 
>   dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK bash
>   �\
>   dualbus@debian:~$ LANG=en_US.UTF8 printf 'echo \u4e57\n' |LANG=en_US.UTF8 
> bash
>   乗

This shows that if it's a valid character in the current locale, bash will
convert it and read it back.  `printf' takes the unicode encoding (in this
case, a three-byte character) and runs it through iconv to try and convert
it to a valid multibyte character in the current locale.

>   dualbus@debian:~$ LANG=zh_CN.GBK printf '\u4e57' | od -tx1 -An
>    81 5c
> 
> It looks like it doesn't detect that \x81\x5c is a single character, and
> instead treats the multibyte character as separate characters.

It's apparently not a single character in that locale.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://cnswww.cns.cwru.edu/~chet/

Reply via email to