On 6/24/17 1:41 PM, Eduardo A. Bustamante López wrote: > I was looking through this old thread: > http://seclists.org/oss-sec/2014/q3/851 > > It looks like the issue reported in there is still there: > > dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK bash > �\ > dualbus@debian:~$ LANG=en_US.UTF8 printf 'echo \u4e57\n' |LANG=en_US.UTF8 > bash > 乗
This shows that if it's a valid character in the current locale, bash will convert it and read it back. `printf' takes the unicode encoding (in this case, a three-byte character) and runs it through iconv to try and convert it to a valid multibyte character in the current locale. > dualbus@debian:~$ LANG=zh_CN.GBK printf '\u4e57' | od -tx1 -An > 81 5c > > It looks like it doesn't detect that \x81\x5c is a single character, and > instead treats the multibyte character as separate characters. It's apparently not a single character in that locale. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/