Kris Kennaway wrote:
Thanks for the pointers, I think a C extension will end up being the way to go, unless someone has beaten me to it and I just haven't found it yet.

Depending on the pattern length you are targeting, it may be fastest to
increase the out-of-loop work.  For a 40-bit string, build an 8-target
Aho-Corasick machine, and at each match check the endpoints.  This will
only work well if 40 bits is at the low end of what you are hunting for.

Roughly:

    targets[0] = 5-byte string as byte-aligned
    targets[N in 1..7] = 4-byte strings representing the lead
                   four bytes after discarding the high-order N bits
    m = AhoCorasick(targets)

    def hunt(m, source):
        m.state = 0
        old_block = None
        for n, block in enumerate(source):
            for recognized, where in m.search(block):
                if recognized:
                    if <check lead and trail bits>:
                        yield block, where, recognized
                else:
                    yield block, where, 0

--Scott David Daniels
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to