Public bug reported: Binary package hint: coreutils
GNU cut gets confused about character boundaries with UTF-8 encoded files. An example, as they (almost) say, is worth a thousand words: [EMAIL PROTECTED]: ~ $ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= [EMAIL PROTECTED]: ~ $ cat foo.txt She said “I think I found a bug.” [EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 10- “I think I found a bug.” [EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 11- ��I think I found a bug.” [EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 12- �I think I found a bug.” [EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 13- I think I found a bug.” ** Affects: coreutils (Ubuntu) Importance: Undecided Status: Unconfirmed -- cut gets confused with UTF-8 characters https://launchpad.net/bugs/91175 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs