On 05.09.2012 11:24, Johannes Bauer wrote: > Consider sorting a large dictionary. Sorting is in O(n log(n)), but this > assumes O(1) comparisons! If the comparison operation itself were in > O(n), the total sorting complexity would be O(n^2 log(n)), which is > definitely false. Most comparisons will abort *very* early (after the > first character).
I've just written a program to demonstrate this (below it's attached). I used my aspell dictionary, which contains about 100k words. To have complexity down I went for only words wich occur most often (in this case 9 characters or ~13k words). Here's the results (note it's randomized, so they'll always differ a little, but are pretty stable) Max Cmp: 100 Sum Cmp: 32111 Avg Cmp: 2.393842254361115 Num words : 13414 Min wordlen: 9 Max wordlen: 9 Avg wordlen: 9.0 Going up in wordlength (only showing part now) Avg Cmp: 2.2374929735806632 Num words : 10674 Avg wordlen: 10.0 Avg Cmp: 2.261727078891258 Num words : 7504 Avg wordlen: 11.0 Avg Cmp: 2.2335647202939977 Num words : 4898 Avg wordlen: 12.0 Avg Cmp: 2.1743341404358354 Num words : 2891 Avg wordlen: 13.0 Avg Cmp: 2.156782549420586 Num words : 1467 Avg wordlen: 14.0 Avg Cmp: 2.112449799196787 Num words : 747 Avg wordlen: 15.0 So, interestingly, in this real-world case(tm), the complexity does not scale with O(n). It actually goes down (which is probably due to the limit amount of words of higher length). In summary, let's please not debate "feelings" or such. Either measurements (with clear indiciation on what data they're based on and how the test was conducted) or derivations (with an explanation of the assumtions). Feelings can be very misleading in cases like this. Best regards, Johannes #!/usr/bin/python3 import random class Word(): def __init__(self, s): self._s = s self._c = 0 def __lt__(self, other): if len(self) < len(other): return True elif len(self) > len(other): return False for i in range(len(self)): self._c += 1 if self._s[i] < other._s[i]: return True return False def __repr__(self): return "%s<%d>" % (self._s, self._c) def __len__(self): return len(self._s) def getcmp(self): return self._c wordlist = [ ] for line in open("dict", "r"): word = Word(line[:-1]) if len(word) == 9: wordlist.append(word) random.shuffle(wordlist) wordlist.sort() comparisons = [ word.getcmp() for word in wordlist ] print("Max Cmp:", max(comparisons)) print("Sum Cmp:", sum(comparisons)) print("Avg Cmp:", sum(comparisons) / len(comparisons)) wordlengths = [ len(word) for word in wordlist ] print("Min wordlen:", min(wordlengths)) print("Max wordlen:", max(wordlengths)) print("Avg wordlen:", sum(wordlengths) / len(wordlengths)) -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1...@speranza.aioe.org> -- http://mail.python.org/mailman/listinfo/python-list