Tim Peters <t...@python.org> added the comment:

I don't think Rabin-Karp is worth trying here. The PR is provably worst-case 
linear time, and the constant factor is already reasonably small. Its only real 
weakness I can see is that it can be significantly (but seemingly not 
dramatically) slower than the status quo, in what are exceptionally good cases 
for the status quo, and perhaps most often due to the increased preprocessing 
cost of the new code (which is relatively most damaging if the needle is found 
early in the haystack - in which cases preprocessing can be close to a pure 
waste of time).

The status quo and the PR both enjoy sublinear (O(len(haystack) / len(needle)) 
cases too, but R-K doesn't.

Where R-K is a "go to" choice is when an app needs to search the same large 
text for several strings (Wikipedia has a decent description of R-K can be 
adapted to that) - but Python has no string API that does such a thing (closest 
is joining the search-for strings via "|" for a regexp search).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to