On Sat, 04 Jun 2011 13:41:33 +1200, Gregory Ewing wrote: >> Python might be penalized by its use of Unicode here, since a >> Boyer-Moore table for a full 16-bit Unicode string would need >> 65536 entries > > But is there any need for the Boyer-Moore algorithm to > operate on characters? > > Seems to me you could just as well chop the UTF-16 up > into bytes and apply Boyer-Moore to them, and it would > work about as well.
No, because that won't care about alignment. E.g. on a big-endian architecture, if you search for '\u2345' in the string '\u0123\u4567', it will find a match (at an offset of 1 byte). -- http://mail.python.org/mailman/listinfo/python-list