On 04.09.2012 23:59, Chris Angelico wrote: >> n = (256 / 255) * (1 - 256 ^ (-c)) >> >> where n is the average number of character comparisons and c. The >> rationale as follows: The first character has to be compared in any >> case. The second with a probability of 1/256, the third with 1/(256^2) >> and so on. > > That would be for comparing two random areas of memory. Python strings > don't have 256 options per character; and in terms of actual strings, > there's so many possibilities.
The unit of comparison is completely arbitraty and can equally be used to compare bitstrings if changed accordingly. > The strings that a program is going to > compare for equality are going to use a vastly restricted alphabet; > for a lot of cases, there might be only a few dozen plausible > characters. This is very true. But if so, I would like to see the assumtion that you're making (which might very well be plausible) in order to arrive at a average of N/2 comparisons (which just seems *way* off). > But even so, it's going to scale approximately linearly with the > string length. If they're really random, then yes, there's little > chance that either a 1MB string or a 2MB string will be the same, but > with real data, they might very well have a long common prefix. So > it's still going to be more or less O(n). I understand what you're saying and surely this is true to some extent (someone mentioned filenames, which is an excellent example). However in order to derive a *formula* you need to formularize your assumtions. With random data, it's definitely not O(n). And even in reality (with a more limited alphabet) it usually will early abort: Consider sorting a large dictionary. Sorting is in O(n log(n)), but this assumes O(1) comparisons! If the comparison operation itself were in O(n), the total sorting complexity would be O(n^2 log(n)), which is definitely false. Most comparisons will abort *very* early (after the first character). Best regards, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1...@speranza.aioe.org> -- http://mail.python.org/mailman/listinfo/python-list