I was looking through this old thread: http://seclists.org/oss-sec/2014/q3/851
It looks like the issue reported in there is still there: dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK bash �\ dualbus@debian:~$ LANG=en_US.UTF8 printf 'echo \u4e57\n' |LANG=en_US.UTF8 bash 乗 dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK mksh � dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK ksh �\ dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK zsh � (In the case that your font doesn't render the glyph for U+4E57, it's: http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4e57) dualbus@debian:~$ LANG=zh_CN.GBK printf '\u4e57' | od -tx1 -An 81 5c It looks like it doesn't detect that \x81\x5c is a single character, and instead treats the multibyte character as separate characters. -- Eduardo Bustamante https://dualbus.me/