On Mon, 03 Sep 2012 21:54:01 -0400, Roy Smith wrote: > Let's assume you're testing two strings for equality. You've already > done the obvious quick tests (i.e they're the same length), and you're > down to the O(n) part of comparing every character. > > I'm wondering if it might be faster to start at the ends of the strings > instead of at the beginning? If the strings are indeed equal, it's the > same amount of work starting from either end. But, if it turns out that > for real-life situations, the ends of strings have more entropy than the > beginnings, the odds are you'll discover that they're unequal quicker by > starting at the end.
And if the strings have the same end and different beginnings: running jumping cooking then you'll find out quicker by starting at the beginning. No general-purpose string comparison operator can make assumptions about the content of the strings, like "they are nearly always pathnames on Unix-like systems like /home/user/x and /home/user/y". I suppose you could have two different operators, == for "test from the front" and === for "test from the back", and push that decision onto the user, who will then spend hours angsting about the "most efficient" equality test and days painstakingly profiling his data to save four nanoseconds at runtime. Besides, then somebody will say "Yes, but what about the cases where the prefix and the suffix are both equal, but the middle will be different?" and propose a third string-equality operator ==== and then there will be bloodshed. On average, string equality needs to check half the characters in the string. Any individual comparisons may require fewer, or more, than that, but whichever way you scan the string, some comparisons will take more and some will take fewer. With no general way of telling ahead of time which will be which, on average you can't do better than O(N/2) and no complexity justification for picking "start from the end" from "start from the start". -- Steven -- http://mail.python.org/mailman/listinfo/python-list