Jim Meyering wrote: > Bruno Haible wrote: > ... >> With this, you can easily create a program that reads UTF-8 from stdin and >> outputs it as canonicalized UTF-8 on stdout: >> - create a "stream" that takes a Unicode character and outputs it to >> stdout. (Gnulib module 'unistr/u8-uctomb'.) >> - Wrap a Unicode normalizing filter around it. (Gnulib module >> 'uninorm/filter'.) >> - Feed it with Unicode characters from standard input. (Gnulib module >> unistr/u8-mbtouc'.) >> >> I would love to see such a program in coreutils. But I am not a coreutils >> maintainer. > > Hi Bruno, > > That sounds like it'd make a fine addition, and you're welcome to > contribute it. Anyone can contribute, assuming they assign copyright. > And you did that for coreutils back before it was called that ;-)
It might be an idea for me to do it, since I know the details of adding new programs to coreutils, and also I need to get to know the unicode APIs in gnulib for further i18n work in coreutils. I've not had much time for anything lately, but I would hope to do that next week if possible. So I'm wondering now why normalization functionality isn't in iconv? Seems like a big ommision to me. There is a mention of it here: http://www.archivum.info/i18n-disc...@opensolaris.org/2006-08/msg00004.html Then I also noticed `uconv` which is in the "icu" package of fedora at least. To normalize text the following worked for me: uconv -x NFC < test.utf8 So iconv may get this in future and uconv already has it. Do we really need another util in coreutils for this? cheers, Pádraig. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils