Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:

Another algorithmic possibility: Instead of the bitset, we could have a 
stack-allocated

    uint8_t jump[32]; // maybe 64? Maybe uint16_t?

It would say this: If the last character lined up in the haystack is congruent 
to i mod (1 << 8), then jump ahead by (neede_len if jump[i]==255 else jump[i]), 
where jump[i] gives the distance between the end of the needle and the last 
occurrence in the needle of a character congruent to i.

Is this sort of constant-but-more-than-a-few-bytes stack-space an acceptable 
constant memory cost? If so, I believe that could be a big improvement.

There are also a bunch of little tweaks that could be done here: For example, 
should a hypothetical jump-table jump to line up the last character in the 
needle, or jump to line up the middle character in the needle? The max from two 
tables? Should we search for the last characters to be equal, or just make sure 
the last character is in the needle-bit-set (like in the PR)? Should there be a 
second bit-set for the right half of the string? Should there be a skip value 
computed for the last character as well as the middle character (middle 
character only in the PR)? etc. etc. I'll be sure to try some of these things.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to