On 12/15/2010 03:23 PM, Pádraig Brady wrote: >>> I spoke too soon. We would also need to patch m4/memmem.m4 to actually >>> perform the empty needle verification in the memmem-simple case. >> >> I also noticed the empty needle verification didn't >> check that the correct pointer was returned, >> and only checked for !NULL. So I rolled that >> fix into the attached reorg. > > I've rebased the attached memmem reorg patch > which splits correctness checks from performance checks.
> Subject: [PATCH] memmem: rearrange memmem and expand memmem-simple modules > > Move all functional checks to memmem-simple so that one has > a fully functional memmem by using just this module. > Restrict the memmem module to performance checks only. > Document exactly how the memmem and memmem-simple modules > relate to each other. Thanks for reviving this. > +Performance problems fixed by Gnulib module @code{memmem}: > +...@itemize > @item > This function has quadratic instead of linear worst-case complexity on some > platforms: > glibc 2.8, FreeBSD 6.2, NetBSD 5.0, AIX 5.1, Solaris 11 2010-11, Cygwin > 1.5.x. > +Note for small needles the replacement may be slower. > @end itemize I really wish I had more time to look into this. Really, it seems like the naive version for up to 4- or even 8-byte needles ought to knock the socks off the factored version by cutting the common-case expense of up-front factorization for the less-common case of fewer comparisons for certain needles. We'd need a good benchmark that tests time on: non-periodic needle, short haystack match non-periodic needle, long haystack before match non-periodic needle, long haystack with no match periodic needle, short haystack match periodic needle, long haystack before match periodic needle, long haystack with no match for varying length needles, and which compares naive, glibc's sse4.2, and the two-way algorithm timings for all of those situations (or gives up, in the case of quadratic timing in naive or glibc's sse4.2 on long worst-case needles) Nothing like a good benchmark to prove that changing the algorithm according to needle length really is a good thing, since Uli was skeptical that his SSE4.2 implementation really is a regression: http://sourceware.org/bugzilla/show_bug.cgi?id=12100 -- Eric Blake ebl...@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature