Pádraig Brady wrote: > Paul Eggert wrote: > >>>>>Using strcoll is inefficient anyway >>>> >>>>Don't we know it! If we can avoid it, we'd like to. >>> >>>Well, the mbstowcs+wcscoll solution I presented >>>should be equivalent to strcoll on any platform, >>>and it's much faster in my tests. >> >> >>That's good to know, though I'm puzzled as to why it's true. For a >>single comparison, can't strcoll typically return an answer without >>examining all the input, and wouldn't that be faster than >>mbstowc+wcscoll? >> >>But if it is true, perhaps we should rewrite memcoll to use the >>mbstowc+wcscoll combination as well. > > > I missed out a test case in my performance runs > for same length lines with random data > (where strcoll can break out early). > I'll run that and comment more.
1 = my test uniq prog 2 = coreutils 5.97 uniq a = ascii long lines, with all same length (85 chars), and 26 identical lines for every 27 b = ascii long lines, with all same length (85 chars), and all adjacent lines different LANG=en_IE.UTF8 \ 1 2 --------------- a| 0.466 5.300 b| 0.447 0.438 There seems to be serious overhead with strcoll on glibc-2.3.5-10 at least. Pádraig. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils