Re: horrible utf-8 performace in wc

2008-05-07 Thread Pádraig Brady
Bo Borgerson wrote: > Pádraig Brady wrote: >> In the first 65535 code points there are also 404 chars which are >> not classed as combining in the unicode database, but are classed >> as zero width in the glibc locale data at least (zero-width space >> being one of them like you mentioned). I deter

Re: horrible utf-8 performace in wc

2008-05-07 Thread Bo Borgerson
Pádraig Brady wrote: > In the first 65535 code points there are also 404 chars which are > not classed as combining in the unicode database, but are classed > as zero width in the glibc locale data at least (zero-width space > being one of them like you mentioned). I determined this with the > atta

Re: horrible utf-8 performace in wc

2008-05-07 Thread Pádraig Brady
Bo Borgerson wrote: > Jim Meyering wrote: >> Bo Borgerson <[EMAIL PROTECTED]> wrote: >>> I may be misinterpreting your patch, but it seems to me that >>> decrementing count for zero-width characters could potentially lead to >>> confusion. Not all zero-width characters are combining characters, ri

Re: horrible utf-8 performace in wc

2008-05-07 Thread Pádraig Brady
Bo Borgerson wrote: > Pádraig Brady wrote: >> canonically équivalent >> canonically équivalent >> >> Pádraig. >> >> p.s. I Notice that gnome-terminal still doesn't handle >> combining characters correctly, and my mail client thunderbird >> is putting the accent on the q rather than the e, sigh. >

Re: horrible utf-8 performace in wc

2008-05-07 Thread Bo Borgerson
Jim Meyering wrote: > Bo Borgerson <[EMAIL PROTECTED]> wrote: >> I may be misinterpreting your patch, but it seems to me that >> decrementing count for zero-width characters could potentially lead to >> confusion. Not all zero-width characters are combining characters, right? > > It looks ok to m

Re: horrible utf-8 performace in wc

2008-05-07 Thread Jim Meyering
Bo Borgerson <[EMAIL PROTECTED]> wrote: > I may be misinterpreting your patch, but it seems to me that > decrementing count for zero-width characters could potentially lead to > confusion. Not all zero-width characters are combining characters, right? It looks ok to me, since there's an unconditi

Re: horrible utf-8 performace in wc

2008-05-07 Thread Jim Meyering
Pádraig Brady <[EMAIL PROTECTED]> wrote: > Jan Engelhardt wrote: >> >> https://bugzilla.novell.com/show_bug.cgi?id=381873 >> >> Forwarding this because it is a GNU issue, not specifically a Novell one. >> I reproduced this myself with the latest coreutils from git >> (BTW: You might want to repack

Re: horrible utf-8 performace in wc

2008-05-07 Thread Jan Engelhardt
On Wednesday 2008-05-07 13:11, Pádraig Brady wrote: > >Now that is a _lot_ of extra time. libiconv could probably be >made more efficient. I've never actually looked at it. >However wc calls mbrtowc() for each multibyte character. >It would probably be a lot more efficient to use mbstowcs() >to co

Re: horrible utf-8 performace in wc

2008-05-07 Thread Bo Borgerson
Pádraig Brady wrote: > canonically équivalent > canonically équivalent > > Pádraig. > > p.s. I Notice that gnome-terminal still doesn't handle > combining characters correctly, and my mail client thunderbird > is putting the accent on the q rather than the e, sigh. They both render correctly he

Re: coreutils-6.11 released

2008-05-07 Thread Jim Meyering
Christophe LYON <[EMAIL PROTECTED]> wrote: > On 25.04.2008 21:04, Jim Meyering wrote: >> Christophe LYON <[EMAIL PROTECTED]> wrote: >>> >>> If I manually add "-lm", I get: >>> .../bin/../lib/gcc/sparc-sun-solaris2.8/4.1.0/crt1.o:(.plt+0x0): >>> multiple definition of `_PROCEDURE_LINKAGE_TABLE_' >>>

Re: coreutils-6.11 released

2008-05-07 Thread Christophe LYON
On 25.04.2008 21:04, Jim Meyering wrote: Christophe LYON <[EMAIL PROTECTED]> wrote: If I manually add "-lm", I get: .../bin/../lib/gcc/sparc-sun-solaris2.8/4.1.0/crt1.o:(.plt+0x0): multiple definition of `_PROCEDURE_LINKAGE_TABLE_' /usr/lib/libm.so:(.plt+0x0): first defined here Is this a supp

Re: horrible utf-8 performace in wc

2008-05-07 Thread Pádraig Brady
Jan Engelhardt wrote: > > https://bugzilla.novell.com/show_bug.cgi?id=381873 > > Forwarding this because it is a GNU issue, not specifically a Novell one. > I reproduced this myself with the latest coreutils from git > (BTW: You might want to repack that repo, "counting objects" during the > clon