On 2006-10-18, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Tim Chase: >> In practice, however, for such small strings as the given >> whitelist, the underlying find() operation likely doesn't put a >> blip on the radar. If your whitelist were some huge document >> that you were searching repeatedly, it could have worse >> performance. Additionally, the find() in the underlying C code >> is likely about as bare-metal as it gets, whereas the set >> membership aspect of things may go through some more convoluted >> setup/teardown/hashing and spend a lot more time further from the >> processor's op-codes. > > With this specific test (half good half bad), on Py2.5, on my PC, sets > start to be faster than the string search when the string "good" is > about 5-6 chars long (this means set are quite fast, I presume). > > from random import choice, seed > from time import clock > > def main(choice=choice): > seed(1) > n = 100000 > > for good in ("ab", "abc", "abcdef", "abcdefgh", > "abcdefghijklmnopqrstuvwxyz"): > poss = good + good.upper() > data = [choice(poss) for _ in xrange(n)] * 10 > print "len(good) = ", len(good) > > t = clock() > for c in data: > c in good > print round(clock()-t, 2) > > t = clock() > sgood = set(good) > for c in data: > c in sgood > print round(clock()-t, 2), "\n" > > main()
On my Python2.4 for Windows, they are often still neck-and-neck for len(good) = 26. set's disadvantage of having to be constructed is heavily amortized over 100,000 membership tests. Without knowing the usage pattern, it'd be hard to choose between them. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list