Kris Kennaway wrote:
Thanks for the pointers, I think a C extension will end up being the way to go, unless someone has beaten me to it and I just haven't found it yet.
Depending on the pattern length you are targeting, it may be fastest to increase the out-of-loop work. For a 40-bit string, build an 8-target Aho-Corasick machine, and at each match check the endpoints. This will only work well if 40 bits is at the low end of what you are hunting for. Roughly: targets[0] = 5-byte string as byte-aligned targets[N in 1..7] = 4-byte strings representing the lead four bytes after discarding the high-order N bits m = AhoCorasick(targets) def hunt(m, source): m.state = 0 old_block = None for n, block in enumerate(source): for recognized, where in m.search(block): if recognized: if <check lead and trail bits>: yield block, where, recognized else: yield block, where, 0 --Scott David Daniels [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list