On Saturday 05 January 2008, Eric Blake wrote: > I've finished my implementation of a Two-Way plus Boyer-Moore hybrid > string search algorithm for the memmmem module. This patch has passed > everything I've thrown at it so far, and it has the nice properties of > avoiding alloca/malloc (hence it is async-safe), as well as allowing > sub-linear complexity for long needles (the added test in test-memmem.c > fails with the KMP algorithm but passes with my Two-Way+BM hybrid). It > guarantees fewer than 2N comparisons, and can determine misses as quickly > as N/M. I chose a threshold of 128 bytes in the needle before attempting > the BM bad-character shift table, since constructing a 256-entry table is > expensive up front with little gain for short needles. > > In my web searches, I've never seen this hybrid approach documented, so I > may well have written the fastest string search algorithm to date. > > I'd appreciate any reviews before checking it in.
on the topic of integrating it elsewhere ... the comment header states that this is part of the GNU C library. if it were part of glibc, it would be under the LGPL-2.1, but the comment header in gnulib states GPL-2. does that mean memmem will not be released under the LGPL-2.1 and won't be merged into glibc ? i'd be interested in making it available to uClibc, but that too would require LGPL-2.1 ... -mike
signature.asc
Description: This is a digitally signed message part.