Re: uniq i18n implementation

Pádraig Brady Mon, 14 Aug 2006 02:19:33 -0700

Pádraig Brady wrote:
> Paul Eggert wrote:
> 
>>>>>Using strcoll is inefficient anyway
>>>>
>>>>Don't we know it!  If we can avoid it, we'd like to.
>>>
>>>Well, the mbstowcs+wcscoll solution I presented
>>>should be equivalent to strcoll on any platform,
>>>and it's much faster in my tests.
>>
>>
>>That's good to know, though I'm puzzled as to why it's true.  For a
>>single comparison, can't strcoll typically return an answer without
>>examining all the input, and wouldn't that be faster than
>>mbstowc+wcscoll?
>>
>>But if it is true, perhaps we should rewrite memcoll to use the
>>mbstowc+wcscoll combination as well.
> 
> 
> I missed out a test case in my performance runs
> for same length lines with random data
> (where strcoll can break out early).
> I'll run that and comment more.


1 = my test uniq prog
2 = coreutils 5.97 uniq

a = ascii long lines, with all same length (85 chars), and 26 identical lines 
for every 27
b = ascii long lines, with all same length (85 chars), and all adjacent lines 
different

LANG=en_IE.UTF8

\  1       2
 ---------------
a| 0.466   5.300
b| 0.447   0.438

There seems to be serious overhead with strcoll on glibc-2.3.5-10 at least.

Pádraig.


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: uniq i18n implementation

Reply via email to