Tim Peters <t...@python.org> added the comment:

Ya, the case for the diff is at best marginal. Note that while it may be 
theoretically provable that the extra test would make the worst cases slower, 
that's almost certainly not measurable. The extra test would almost never be 
executed in the worst cases: in those the last characters of the needle and 
haystack DO match, and we spend almost all the total time in the

                for (j = 0; j < mlast; j++)

loop.  The extra test is only in the branch where the last characters _don't_ 
match.

In that branch, it's already certain that the last haystack character does not 
match the last needle character, so in a probabilistic sense, for "random" data 
that increases the odds that the last haystack character isn't in the needle at 
all. In which case the payoff is relatively large compared to the cost of the 
test (can skip len(needle) positions at once, instead of only 1).

I don't believe this can be out-thought. It would require timing on "typical" 
real-life searches. Which are likely impossible to obtain ;-)

Offhand do you know what the _best_ timing for two-way search is in a 
pattern-not-found case? The algorithm is too complicated for that to be 
apparent to me at a glance. As Fredrik said in the post of his I linked to, he 
was very keen to have an algorithm with best case sublinear time. For example, 
the one we have takes best-case not-found O(n/m) time (where n is the haystack 
length and m the needle length).  For example, searching for 'B' * 1_000_000 in 
'A' * 10_000_000 fails after just 9 character comparisons (well, not counting 
the million less 1 compares to initialize `skip` to the ultimately useless 0).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to