Eric Blake wrote: > I've finished my implementation of a Two-Way plus Boyer-Moore hybrid > string search algorithm for the memmmem module. This patch has passed > everything I've thrown at it so far, and it has the nice properties of > avoiding alloca/malloc (hence it is async-safe), as well as allowing > sub-linear complexity for long needles (the added test in test-memmem.c > fails with the KMP algorithm but passes with my Two-Way+BM hybrid). It > guarantees fewer than 2N comparisons, and can determine misses as quickly > as N/M. I chose a threshold of 128 bytes in the needle before attempting > the BM bad-character shift table, since constructing a 256-entry table is > expensive up front with little gain for short needles.
Marvellous!! Thanks for the comments; they are essential for understanding the code. I also find your comments clearer than the text in .../~lecroq/... Just two typos: sed -e s/acheive/achieve/ -e s/Peform/Perform/ And LONG_NEEDLE_THRESHOLD should better be (1U << (CHAR_BIT - 1)) rather than (1 << (CHAR_BIT - 1)), for platforms where CHAR_BIT = 32. On such platforms, the function two_way_long_needle will not be used. Therefore it's also appropriate to wrap the added test (tests/test-memmem.c lines 155..180) in an if (CHAR_BIT < 10) { ... }. Bruno