Re: new modules for Unicode normalization

2009-02-22 Thread Pádraig Brady
Bruno Haible wrote: > Pádraig Brady wrote: >>> I cannot estimate how much of these 10 MB get actually loaded into a >>> process' working set. ... >> $ uconv -x NFC& >> $ sudo bin/ps_mem.py | grep uconv >> Private + Shared = RAM used Program >> 1.9 MiB + 788.0 KiB = 2.7 MiB uco

Re: new modules for Unicode normalization

2009-02-22 Thread Bruno Haible
Pádraig Brady wrote: > > I cannot estimate how much of these 10 MB get actually loaded into a > > process' working set. ... > > $ uconv -x NFC& > $ sudo bin/ps_mem.py | grep uconv > Private + Shared = RAM used Program > 1.9 MiB + 788.0 KiB = 2.7 MiB uconv A great tool! Let's

Re: new modules for Unicode normalization

2009-02-22 Thread Pádraig Brady
Bruno Haible wrote: > Hi Pádraig, > >> So I'm wondering now why normalization functionality isn't in iconv? >> Seems like a big ommision to me. > [snip valid points on iconv limitations] >> There is a mention of it here: >> http://www.archivum.info/i18n-disc...@opensolaris.org/2006-08/msg4

Re: new modules for Unicode normalization

2009-02-21 Thread Bruno Haible
Hi Pádraig, > So I'm wondering now why normalization functionality isn't in iconv? > Seems like a big ommision to me. 1) Not every functionality that is a filter should become part of iconv. Unicode normalization forms? Removal of accents? Case conversions? Transliteration from one script t

Re: new modules for Unicode normalization

2009-02-21 Thread Bruno Haible
Hi Jim, > That sounds like it'd make a fine addition, and you're welcome to > contribute it. Thanks for the invitation. But I am already busy in too many areas. I leave this small project to someone familiar with coreutils and its coding guidelines. Bruno __

Re: new modules for Unicode normalization

2009-02-21 Thread Pádraig Brady
Jim Meyering wrote: > Bruno Haible wrote: > ... >> With this, you can easily create a program that reads UTF-8 from stdin and >> outputs it as canonicalized UTF-8 on stdout: >> - create a "stream" that takes a Unicode character and outputs it to >> stdout. (Gnulib module 'unistr/u8-uctomb'.)

Re: new modules for Unicode normalization

2009-02-21 Thread Jim Meyering
Bruno Haible wrote: ... > With this, you can easily create a program that reads UTF-8 from stdin and > outputs it as canonicalized UTF-8 on stdout: > - create a "stream" that takes a Unicode character and outputs it to > stdout. (Gnulib module 'unistr/u8-uctomb'.) > - Wrap a Unicode normali

Re: new modules for Unicode normalization

2009-02-21 Thread Bruno Haible
Hi Pádraig, Bo, On 2008-05-08, when I mentioned the possibility to have a filter program that reads from standard input and writes the canonicalized output to standard output, you liked this idea: