>> If you're doing it on a time-critical basis, it might help to >> make "valid" a set, which should have O(1) membership testing, >> rather than using the "in" test with a string. I don't know >> how well the find() method of a string performs in relationship >> to "in" testing of a set. Test and see, if it's important. > > The find method of (8-bit) strings is really, really fast. My > guess is that set can't beat it. I tried to beat it recently with > a binary search function. Even after applying psyco find was > still faster (though I could beat the bisect functions by a > little bit by replacing a divide with a shift).
In "theory" (you know...that little town in west Texas where everything goes right), a set-membership test should be O(1). A binary search function would be O(log N). A linear search of a string for a member should be O(N). In practice, however, for such small strings as the given whitelist, the underlying find() operation likely doesn't put a blip on the radar. If your whitelist were some huge document that you were searching repeatedly, it could have worse performance. Additionally, the find() in the underlying C code is likely about as bare-metal as it gets, whereas the set membership aspect of things may go through some more convoluted setup/teardown/hashing and spend a lot more time further from the processor's op-codes. And I know that a number of folks have done some hefty optimization of Python's string-handling abilities. There's likely a tradeoff point where it's better to use one over the other depending on the size of the whitelist. YMMV -tkc -- http://mail.python.org/mailman/listinfo/python-list