Tim Peters <t...@python.org> added the comment:

The attached fastsearch.diff removes the speed hit in the original test case 
and in the constructed one.

I don't know whether it "should be" applied, and really can't make time to dig 
into it.

The rationale:  when the last characters of the pattern string and the current 
search window don't match, this is what happens:  if the "Bloom filter" can 
prove that the character one beyond the search window isn't in the pattern at 
all, the search window is advanced by len(pattern) + 1 positions. Else it's 
advanced only 1 position.

What changes here: if the Bloom filter thinks the character one beyond the 
search window may be in the pattern, this goes on to see whether the Bloom 
filter thinks the last character in the search window may be in the pattern (by 
case assumption, we already know it's not the _last_ character in the pattern). 
If not, the search window can be advanced by len(pattern) positions.

So it adds an extra check in the last-characters-don't-match and 
next-character-may-be-in-the-pattern case:  if the last character is not in the 
pattern, it's irrelevant that the next character may be - there's no possible 
overall match including the last character, so we can move the search window 
beyond it.

The question is whether this costs more than it saves overall.  It saves 
literally hours of CPU time for one search in this report's case ;-)  But 
worst-case time remains O(m*n) regardless, just somewhat harder to provoke.

----------
keywords: +patch
Added file: https://bugs.python.org/file49503/fastsearch.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to