On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith <r...@panix.com> wrote: > The hash vs. tree argument can get very complicated. For example, if > your tree is not completely resident in memory, the cost of paging in a > node will swamp everything else, and improving lookup speed will boil > down to reducing the number of I/O operations you need. An expensive > hash plus a linear walk through a collision chain which was resident in > a single memory block will beat traversing two nodes, each of which had > to be paged in separately.
Indeed, which is broadly an extension of the "cache locality" argument. I've never actually yearned for any of the advanced operations a tree can give (over a hash table). Usually, by the time I'm looking for that sort of thing, I really want an on-disk database - that solves all the problems of paging (the DBMS will make sure it reads in a minimum of index and data pages), and a good SQL database can handle multiple indexes in a space-efficient way. Plus, what you said about log 1,000,000? By the time you're looking at a million records, you probably (a) need to have them on disk somewhere anyway, and (b) don't want to read them all into RAM before you begin. ChrisA -- https://mail.python.org/mailman/listinfo/python-list