On Mon, Sep 3, 2012 at 9:54 PM, Roy Smith <r...@panix.com> wrote: > There's been a bunch of threads lately about string implementations, and > that got me t > > On > hinking (which is often a dangerous thing). > > Let's assume you're testing two strings for equality. You've already > done the obvious quick tests (i.e they're the same length), and you're > down to the O(n) part of comparing every character. > > I'm wondering if it might be faster to start at the ends of the strings > instead of at the beginning? If the strings are indeed equal, it's the > same amount of work starting from either end. But, if it turns out that > for real-life situations, the ends of strings have more entropy than the > beginnings, the odds are you'll discover that they're unequal quicker by > starting at the end. > > It doesn't seem un-plausible that this is the case. For example, most > of the filenames I work with begin with "/home/roy/". Most of the > strings I use as memcache keys have one of a small number of prefixes. > Most of the strings I use as logger names have common leading > substrings. Things like credit card and telephone numbers tend to have > much more entropy in the trailing digits. > > On the other hand, hostnames (and thus email addresses) exhibit the > opposite pattern. > > Anyway, it's just a thought. Has anybody studied this for real-life > usage patterns? > > I'm also not sure how this work with all the possible UCS/UTF encodings. > With some of them, you may get the encoding semantics wrong if you don't > start from the front. > -- > http://mail.python.org/mailman/listinfo/python-list >
First include len(string)/2, in order to include starting at the center of the string, and threading/weaving by 2 processes out. import timeit do the the rest, and see which has the fastest time. -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com*
-- http://mail.python.org/mailman/listinfo/python-list