Re: Efficient way of testing for substring being one of a set?

Jeff Thu, 03 Apr 2008 06:05:39 -0700

On Apr 3, 8:19 am, George Sakkis <[EMAIL PROTECTED]> wrote:
> On Apr 3, 8:03 am, Jeff <[EMAIL PROTECTED]> wrote:
>
> > def foo(sample, strings):
> >         for s in strings:
> >                 if sample in s:
> >                         return True
> >         return False
>
> > This was an order of magnitude faster for me than using str.find or
> > str.index.  That was finding rare words in the entire word-list (w/
> > duplicates) of War and Peace.
>
> If you test against the same substrings over and over again, an
> alternative would be to build a regular expression:
>
> import re
> search = re.compile('|'.join(re.escape(x)
>                              for x in substrings)).search
> p = search(somestring)
> if p is not None:
>   print 'Found', p.group()
>
> George


That would be an enormous regular expression and eat a lot of memory.
But over an enormous number of substrings, it would be O(log n),
rather than O(n).
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Efficient way of testing for substring being one of a set?

Reply via email to