zendas@Backup-Server:/tmp$ echo "你好啊" | cut -c 1-3 | od -b 0000000 344 275 240 012 0000004 zendas@Backup-Server:/tmp$ echo "你好啊" | cut -c 1 | od -b 0000000 344 012 0000002 zendas@Backup-Server:/tmp$ echo "你好啊" | cut -b 1 | od -b 0000000 344 012 0000002 zendas@Backup-Server:/tmp$ echo "你好啊" | cut -nb 1 | od -b 0000000 344 012 0000002 zendas@Backup-Server:/tmp$ echo "你好啊" | cut -c 1-3 你 zendas@Backup-Server:/tmp$ echo "你好啊" | cut -c 1 � zendas@Backup-Server:/tmp$ ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
在 2022年1月10日 星期一 上午 3:51,zendas <zen...@protonmail.com> 寫道: > Reference source: > > https://blog.csdn.net/m0_38110132/article/details/79883827 > > my environment is: > > zendas@Backup-Server:~$ cat /etc/debian_version > > 11.1 > > zendas@Backup-Server:~$ cut --version > > cut (GNU coreutils) 8.32 > > Copyright (C) 2020 Free Software Foundation, Inc. > > 授權條款 GPLv3+:GNU 通用公共授權條款第 3 版或更新版本 https://gnu.org/licenses/gpl.html。 > > 本軟體是自由軟體:您可以自由修改和重新發布它。 > > 在法律範圍內沒有其他保證。 > > 由 David M. Ihnat、David MacKenzie 和 Jim Meyering 編寫。 > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > 在 2022年1月10日 星期一 上午 3:40,Bob Proulx b...@proulx.com 寫道: > > > zendas wrote: > > > > > Hello, I need to get Chinese characters from the string. I googled a > > > > > > lot of documents, it seems that the -c parameter of cut should be > > > > > > able to meet my needs, but I even directly execute the instructions > > > > > > on the web page, and the result is different from the > > > > > > demonstration. I have searched dozens of pages but the results are > > > > > > not the same as the demo, maybe this is a bug? > > > > Unfortunately the example was attached as images instead of as plain > > > > text. Please in the future copy and paste the example as text rather > > > > than as an image. As an image it is impossible to reproduce by trying > > > > to copy and paste the image. As an image it is impossible to search > > > > for the strings. > > > > The images were also lost somehow from the various steps in the > > > > mailing list pipelines with this message. First it was classified as > > > > spam by the anti-spam robot (SpamAssassin-Bogofilter-CRM114). I > > > > caught it in review and re-sent the message. That may have been the > > > > problem specifically with images. > > > > > For example: > > > > > > https://blog.csdn.net/xuzhangze/article/details/80930714 > > > > > > [20180705173450701.png] > > > > > > the result of my attempt: > > > > > > [螢幕快照 2022-01-10 02:49:46.png] > > > > One of the two images: > > > > https://debbugs.gnu.org/cgi/bugreport.cgi?msg=5;bug=53145;att=3;filename=20180705173450701.png > > > > Second problem is that the first image shows as being corrupted. I > > > > can view the original however. To my eye they are similar enough that > > > > the one above is sufficient and I do not need to re-send the corrupted > > > > image. > > > > As to the problem you have reported it is due to lack of > > > > internationalization support for characters. -c is the same as -b at > > > > this moment. > > > > https://www.gnu.org/software/coreutils/manual/html_node/cut-invocation.html#cut-invocation > > > > ‘-c CHARACTER-LIST’ > > > > ‘--characters=CHARACTER-LIST’ > > > > Select for printing only the characters in positions listed in > > > > CHARACTER-LIST. The same as ‘-b’ for now, but internationalization > > > > will change that. Tabs and backspaces are treated like any other > > > > character; they take up 1 character. If an output delimiter is > > > > specified, (see the description of ‘--output-delimiter’), then > > > > output that string between ranges of selected bytes. > > > > For multi-byte UTF-8 characters the -c option will operate the same as > > > > the -b option as of the current version and is not suitable for > > > > dealing with multi-byte characters. > > > > $ echo '螢幕快照' > > > > 螢幕快照 > > > > $ echo '螢幕快照' | cut -c 1 > > > > ? > > > > $ echo '螢幕快照' | cut -c 1-3 > > > > 螢 > > > > $ echo '螢幕快照' | cut -b 1-3 > > > > 螢 > > > > If the characters are known to be 3 bytes multi-characters then I > > > > might suggest using -b to workaround the problem assuming 3 byte > > > > characters. Eventually when -c is coded to handle multi-byte > > > > characters the handling as bytes will change. Using -b would avoid > > > > that change. > > > > Some operating systems have patched that specific version of utilities > > > > locally to add multi-byte character handling. But the patches have > > > > not been found acceptable for inclusion. That is why there are > > > > differences between different operating systems. > > > > Bob