Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:
> Offhand do you know what the _best_ timing for two-way search is in a > pattern-not-found case? Looking at the glibc implementation, in the top-level "else" clause (for when the string isn't completely periodic), we have: period = MAX (suffix, needle_len - suffix) + 1; so period > needle_len / 2. Then during each iteration of the j-loop (where j is the index into the haystack), j is sometimes incremented by `period`, which would probably give O(n/m) best case. Here's a sublinear example for the two-way algorithm: N = 10**7 haystack = "ABC" * N needle1 = "ABC" * (N // 10) + "DDD" needle2 = "ABC" * 10 + "DDD" === Results === Pure Python Two-Way, shorter needle: 1.7142754 Pure Python Two-Way. Longer needle: 0.0018845000000000667 Python builtin, shorter needle: 0.031071400000000082 Python builtin, longer needle: 0.030566099999999707 This case is surprisingly better than the builtin: N = 10**7 haystack = "ABC" * N needle1 = "DDD" + "ABC" * (N // 10) needle2 = "DDD" + "ABC" * 10 === Results === Pure Python Two-Way, shorter needle: 0.0020895000000000774 Pure Python Two-Way. Longer needle: 0.0017878999999999534 Python builtin, shorter needle: 0.029998100000000028 Python builtin, longer needle: 0.02963439999999995 This was measured using the attached fastsearch.py. There are some manipulations in there like string slicing that would make more sense as pointer math in C, so especially accounting for that, I'm guessing it would be pretty competitive in a lot of cases. A lot of the implementations look like they use a complete Boyer-Moore table to go sublinear in even more cases, which it seems we want to avoid. I'm not certain, but it's conceivable that the existing techniques of using just one one delta-value or the "Bloom filter" could be thrown in there to get some of the same improvements. Although that could be redundant -- I'm not sure. ---------- Added file: https://bugs.python.org/file49507/fastsearch.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41972> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com