bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?)

Bernhard Voelker Wed, 12 Jan 2022 05:54:07 -0800

On 1/12/22 12:19, zendas via GNU coreutils Bug Reports wrote:
> I have considered dealing with this problem directly with three bytes 
> instead, but I have two doubts, I can correctly use wc -m to recognize the 
> bytes in the same environment (but cut can't?), and my script goal is to 
> recognize Chinese, will The probability of execution is higher on platforms 
> that support Chinese environment. In addition, the fixed three-byte approach 
> cannot handle the mixed content of full shape and half shape. I need a lot of 
> judgment and conversion, which will greatly increase the possibility of 
> errors.


As Bob wrote, some downstream distributions have multi-byte support in cut(1) 
for many years,
e.g. RHEL/Fedora and SUSE/openSUSE.

E.g. here on my openSUSE system:

  $ echo "你好啊" | LC_ALL=zh_CN.UTF-8 cut -c 1
  你

Have a nice day,
Berny

bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?)

Reply via email to