Create a new test2.txt, the content is 星期一 星期二 星期三 星期四 星期五 星期六 星期日 ============================= zendas@Backup-Server:/tmp$ cat test2.txt 星期一 星期二 星期三 星期四 星期五 星期六 星期日 zendas@Backup-Server:/tmp$ ============================= zendas@Backup-Server:/tmp$ cut -c 1 test2.txt � � � � � � � zendas@Backup-Server:/tmp$ cut -c 2 test2.txt � � � � � � � zendas@Backup-Server:/tmp$ cut -c 1-3 test2.txt 星 星 星 星 星 星 星 zendas@Backup-Server:/tmp$ ============================= Reference source: https://blog.csdn.net/m0_38110132/article/details/79883827
my environment is: zendas@Backup-Server:~$ cat /etc/debian_version 11.1 zendas@Backup-Server:~$ cut --version cut (GNU coreutils) 8.32 Copyright (C) 2020 Free Software Foundation, Inc. 授權條款 GPLv3+:GNU 通用公共授權條款第 3 版或更新版本 <https://gnu.org/licenses/gpl.html>。 本軟體是自由軟體:您可以自由修改和重新發布它。 在法律範圍內沒有其他保證。 由 David M. Ihnat、David MacKenzie 和 Jim Meyering 編寫。 ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ 在 2022年1月10日 星期一 上午 3:40,Bob Proulx <b...@proulx.com> 寫道: > zendas wrote: > > > Hello, I need to get Chinese characters from the string. I googled a > > > > lot of documents, it seems that the -c parameter of cut should be > > > > able to meet my needs, but I even directly execute the instructions > > > > on the web page, and the result is different from the > > > > demonstration. I have searched dozens of pages but the results are > > > > not the same as the demo, maybe this is a bug? > > Unfortunately the example was attached as images instead of as plain > > text. Please in the future copy and paste the example as text rather > > than as an image. As an image it is impossible to reproduce by trying > > to copy and paste the image. As an image it is impossible to search > > for the strings. > > The images were also lost somehow from the various steps in the > > mailing list pipelines with this message. First it was classified as > > spam by the anti-spam robot (SpamAssassin-Bogofilter-CRM114). I > > caught it in review and re-sent the message. That may have been the > > problem specifically with images. > > > For example: > > > > https://blog.csdn.net/xuzhangze/article/details/80930714 > > > > [20180705173450701.png] > > > > the result of my attempt: > > > > [螢幕快照 2022-01-10 02:49:46.png] > > One of the two images: > > https://debbugs.gnu.org/cgi/bugreport.cgi?msg=5;bug=53145;att=3;filename=20180705173450701.png > > Second problem is that the first image shows as being corrupted. I > > can view the original however. To my eye they are similar enough that > > the one above is sufficient and I do not need to re-send the corrupted > > image. > > As to the problem you have reported it is due to lack of > > internationalization support for characters. -c is the same as -b at > > this moment. > > https://www.gnu.org/software/coreutils/manual/html_node/cut-invocation.html#cut-invocation > > ‘-c CHARACTER-LIST’ > > ‘--characters=CHARACTER-LIST’ > > Select for printing only the characters in positions listed in > > CHARACTER-LIST. The same as ‘-b’ for now, but internationalization > > will change that. Tabs and backspaces are treated like any other > > character; they take up 1 character. If an output delimiter is > > specified, (see the description of ‘--output-delimiter’), then > > output that string between ranges of selected bytes. > > For multi-byte UTF-8 characters the -c option will operate the same as > > the -b option as of the current version and is not suitable for > > dealing with multi-byte characters. > > $ echo '螢幕快照' > > 螢幕快照 > > $ echo '螢幕快照' | cut -c 1 > > ? > > $ echo '螢幕快照' | cut -c 1-3 > > 螢 > > $ echo '螢幕快照' | cut -b 1-3 > > 螢 > > If the characters are known to be 3 bytes multi-characters then I > > might suggest using -b to workaround the problem assuming 3 byte > > characters. Eventually when -c is coded to handle multi-byte > > characters the handling as bytes will change. Using -b would avoid > > that change. > > Some operating systems have patched that specific version of utilities > > locally to add multi-byte character handling. But the patches have > > not been found acceptable for inclusion. That is why there are > > differences between different operating systems. > > Bob