not particularly? define "character"... i think Apple/Swift is the only major proponent of "character == extended grapheme cluster[https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries]"?
plan 9's wc used `-r` (for "rune") to make it clearer that it was counting code points. coreutils (and every other wc, including Apple's) has the same behavior, but calls it `-m` (for "multibyte"). plan 9 also had `-b` (for "broken", that is: invalid byte sequences), which to this day i think was a troll (because it's _not_ "b for bytes", that's "c", except that's not "c for characters", it's "c for `char`s"). while there _might_ be an argument for adding `-e` (for "extended grapheme cluster"), i think you'd want to leave `-m` alone for compatibility, and your `-e` would probably have people asking how exactly coreutils lets you deal with https://unicode.org/reports/tr15/ and conversions between different forms? :-) On Mon, Mar 11, 2024 at 12:24 PM Nick <gnu-...@acrasis.net> wrote: > > El 2024-03-11 14:33 PYST, Pádraig Brady escribió: > > On 10/03/2024 15:16, Nick wrote: > > > Markus Kuhn's FAQ says "A combining character is not a full > > > character by itself" but wc is saying that it is, no? > > > It's a fair point. Libre Office for example will count as one > > character. > > Thank you. Is wc's behaviour here not considered a bug? > -- > Nick > Asunción 16:18 PYST ► 40°C ◆ algo de nubes ◆ 7Km/h NE ◆ 37% HR >