On Wed, 19 Mar 2014 10:49:33 +0200, Marko Rauhamaa wrote: > Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>: > >> If you are in a position to randomize the data before storing it in the >> tree, an unbalanced binary tree is a solid contender. > > Applications that can assume randomly distributed data are exceedingly > rare
Please re-read what I wrote. I didn't say "if your data comes to you fully randomized". I said "if you are in a position to randomize the data before storing it". In other words, if you actively randomize the data yourself. > and even then, you can't easily discount the possibility of > worst-case ordering. Of course you can. If you have a million items, then the odds that those million items happen to be sorted (the worst-case order) are 1 in a million factorial. That's a rather small number, small enough that we can discount it from ever happening, not in a million lifetimes of the Universe. Of course, you would be right to point out that one would also like to avoid *nearly* worst-case ordering. Nevertheless, there are Vastly more ways that the data will be sufficiently randomized as to avoid degenerate performance than the worst-case poor performance. > In fact, Dan himself said unbalanced trees performed so badly he > couldn't test them satisfactorily. You're misrepresenting what Dan said. What he actually said was that unbalanced trees perform second only to dicts with random data, only with sorted data do they perform badly. His exact words were: "For a series of random keys, it would do quite well (probably second only to dict), but for a series of sequential keys it would take longer than anyone would reasonably want to wait" -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list