Public bug reported:

Binary package hint: coreutils

GNU cut gets confused about character boundaries with UTF-8 encoded
files.

An example, as they (almost) say, is worth a thousand words:

[EMAIL PROTECTED]: ~ $ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[EMAIL PROTECTED]: ~ $ cat foo.txt
She said “I think I found a bug.”
[EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 10-
“I think I found a bug.”
[EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 11-
��I think I found a bug.”
[EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 12-
�I think I found a bug.”
[EMAIL PROTECTED]: ~ $ cat foo.txt | cut --characters 13-
I think I found a bug.”

** Affects: coreutils (Ubuntu)
     Importance: Undecided
         Status: Unconfirmed

-- 
cut gets confused with UTF-8 characters
https://launchpad.net/bugs/91175

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to