Bruno Haible wrote: > As a consequence: > - The number of characters is the same as the number of wide characters. > - "wc -m" must output the number of characters. > - In a Unicode locale, <U00E9> is one character, and <U0065><U0301> is > two characters,
Fair enough. > If you want wc to count characters after canonicalization, then you can > invent a new wc command-line option for it. I guess one would could possibly have --chars={unicode,glyph,grapheme,column} with unicode being the default, and how it currently works. > But I would find it more useful > to have a filter program that reads from standard input and writes the > canonicalized output to standard output; that would be applicable in many > more situations. That would be _very_ useful, yes. thanks for all the great info in this thread, Pádraig. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils