On 2014-09-25 10:09:30 -0400, Robert Haas wrote: > I think the long-term solution here is that we need a lock-free hash > table implementation for our buffer mapping tables, because I'm pretty > sure that just cranking the number of locks up and up is going to > start to have unpleasant side effects at some point. We may be able > to buy a few more years by just cranking it up, though.
I think mid to long term we actually need something else than a hashtable. Capable of efficiently looking for the existance of 'neighboring' buffers so we can intelligently prefetch far enough that the read actually completes when we get there. Also I'm pretty sure that we'll need a way to efficiently remove all buffers for a relfilenode from shared buffers - linearly scanning for that isn't a good solution. So I think we need a different data structure. I've played a bit around with just replacing buf_table.c with a custom handrolled hashtable because I've seen more than one production workload where hash_search_with_hash_value() is both cpu and cache miss wise top#1 of profiles. With most calls coming from the buffer mapping and then from the lock manager. There's two reasons for that: a) dynahash just isn't very good and it does a lot of things that will never be necessary for these hashes. b) the key into the hash table is *far* too wide. A significant portion of the time is spent comparing buffer/lock tags. The aforementioned replacement hash table was a good bit faster for fully cached workloads - but at the time I wrote I could still make it crash in very high cache pressure workloads, so that should be taken with a fair bit of salt. I think we can comparatively easily get rid of the tablespace in buffer tags. Getting rid of the database already would be a fair bit harder. I haven't really managed to get an idea how to remove the fork number without making the catalog much more complicated. I don't think we can go too long without at least some of these steps :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers