On 2016-07-27 10:04:52 -0400, Robert Haas wrote: > On Tue, Jul 26, 2016 at 8:43 PM, Andres Freund <and...@anarazel.de> wrote: > > As previously mentioned, dynahash isn't particularly fast. In many cases > > that doesn't particularly matter. But e.g. hash aggregation and bitmap > > index scans are quite sensitive to hash performance. > > > > The biggest problems with dynahash (also discussed at [1]) are > > > > a) two level structure doubling the amount of indirect lookups > > b) indirect function calls > > c) using separate chaining based conflict resolution > > d) being too general. > > I am ... skeptical about open addressing. It's less unappealing for > this application than in general because we don't actually need to > delete anything, but that wouldn't be true for the catcache.
I don't think deletions really change the picture for open addressing - you don't need toombstones, you can instead move elements closer to their origin at deletion. I just hadn't implemented that yet, because it didn't seem like a crucial point. Unfortunately open addressing, particularly linear one, has such vastly superiour cache access patterns, that it's hard to come close with a chained hash table. I've previously tried a chained hash table, and the performance gains were a lot smaller. For that reason many hash-table implementations used in performance critical paths appear to be open addressing. Don't get me wrong, I do certainly think that there's good cases for using separate chaining in places where performance is less of a concern; and possibly when you want lock-free concurrency. > All the same, I feel that we need to assess the risk that we're > improving average-case performance while creating large regressions in > other cases (i.e. where there is clustering). The actual algorithmic worst-case complexity isn't different. It's obviously important to resize appropriately. It's not that hard to beat dynahash's memory efficiency, so fillfactors don't have to be kept super high to not regress in memory usage. > Do likely() and unlikely() actually buy us anything here? It's only a few percent here, but overall, especially when used around error checks, it's above 10%. The reason I used it here is primary that it made the assembly easier to read for me... ;) Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers