Hello Eric, On 15 May 2015 at 18:01, Eric Blake <ebl...@redhat.com> wrote: > On 05/15/2015 09:13 AM, Claudio Fontana wrote: > >>> Or back to the original question - why are we worrying about the O(n) >>> memmem implementation when the naive O(n^2) is MUCH shorter and easier >>> to understand, and where the scaling penalty is only apparent on really >>> long corner case strings? Is our use of memmem going to hit the corner >>> cases where scaling matters? >> >> For the specific use case here (memory search functionality), >> the haystack is normally the factor which really can scale up a lot. > > Remember, when searching, the worst-case penalty is determined by a > combination of the size of the haystack (H) and the size of the needle > (I'll choose M to represent that, to avoid confusion with the generic n > in Big-O notation). The most brain-dead loop (for each byte in > haystack, see if the needle matches) is H*M operations; this is > noticeable as quadratic scaling (and even then, only for certain > matching patterns, such as determining if a needle of all 'a' except for > a final 'b' occurs in a haystack of all 'a') only if both needle and > haystack can grow arbitrarily large: O(H*M) == O(n^2). Where my code > shines is that it breaks the operation into two parts - a factorization > of the needle O(M) == O(n), then a linear search using that factored > needle O(H) == O(n), for an overall cost O(H+M) == O(n). But with the > naive algorithm, if the needle is capped to a fixed size, then you have > O(constant*H) == O(H) == O(n). > > Meanwhile, remember that Big-O notation is only about scaling effects, > but that there are still constant factors to be considered - an O(n) > algorithm that requires 3 comparisons per byte of the haystack will > always be slower than an O(n) algorithm that only needs 2 comparisons > per byte of the haystack, even though both algorithms scale linearly. > >> >> I would expect the needle to be usually smallish (though for string search >> sometimes the needle could probably arrive at the threshold of 32). >> >> When scanning a very large memory range, considering a binary string with >> roughly length around 32, do you think that the non-trivial implementation >> could bring some noticeable benefits? >> >> If it's below 32, do we get any benefits compared to the trivial >> implementation? > > In fact, one of the glibc optimizations was realizing that my two-way > algorithm code is LOUSY at short needles. It has a high relative > overhead for finding the proper factorization of a short needle, before > it can even start searching. A hybrid approach (brute force a short > needle, and only bother with factorizing a long needle, or even after a > heuristic of a minimum number of comparisons is first met) is ideal, > because of the observation above that the poor scaling of quadratic > behavior is only a problem for large needles, while the constant > overhead of factorization is measurably larger in proportion to the rest > of the work done on short needles. > > If you are letting the user search for arbitrarily long needles, then an > O(n) scale may be worth the speed penalty on shorter needles, or the > complexity of a hybrid approach (again, I think glibc may have better > hybrid approach than gnulib at the moment). But as this is an HMP > interface, the user has to type the search string, and is unlikely to be > typing long needles, so it is justifiable to just ditch the complexity > of a hybrid approach and use ONLY the naive O(n^2) algorithm, because as > I argued above, it is still observable only as O(n) for a bounded-size > needle. > >> >>> >>> [Personal note: I'm still tickled pink every time I see yet another >>> project adopt my O(n) code - I didn't come up with the algorithm, but >>> merely scraped it from public research, and was shocked to note how many >>> libc still had O(n^2) code at the time. But it is still gratifying to >>> see that my implementation is getting such wide adoption] >>> >> >> Well your code and its derived versions have been tried and tested a lot, so >> picking it up just makes sense to me, to avoid getting into the same corner >> cases and issues that your implementation already had to go through. And I >> am a lazy person of course! > > You do have a point here - if we are going to reuse code, then reusing a > debugged implementation rather than writing from scratch is the way to > go. But my point remains that the complexity of a reused Two-Way > algorithm may not be necessary, and the 3-line naive algorithm is > trivial to review. > > What's more, it is also arguable that we only use > the naive algorithm on less-than-stellar platforms that lack memmem > natively (well, we may also use less-than-stellar performance if the > native memmem is not O(n)) - but who is going to be using the HMP > command on such a platform? And maybe those platforms will finally be > convinced to add a decent memmem if they realize that we are slow on > their platform but not on other platforms, precisely because we are > working around their lack of memmem by intentionally using a slow but > trivial replacement.
Good points, I will respin with a trivial implementation shortly. Thanks! Claudio