On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila <amit.kap...@huawei.com> wrote: > I still have one more doubt, consider the below scenario for cases when we > Invalidate buffers during moving to freelist v/s just move to freelist > > Backend got the buffer from freelist for a request of page-9 (number 9 is > random, just to explain), it still have association with another page-10 > It needs to add the buffer with new tag (new page association) in bufhash > table and remove the buffer with oldTag (old page association). > > The benefit for just moving to freelist is that if we get request of same > page until somebody else used it for another page, it will save read I/O. > However on the other side for many cases > Backend will need extra partition lock to remove oldTag (which can lead to > some bottleneck). > > I think saving read I/O is more beneficial but just not sure if that is best > as cases might be less for it.
I think saving read I/O is a lot more beneficial. I haven't seen evidence of a severe bottleneck updating the buffer mapping tables. I have seen some evidence of spinlock-level contention on read workloads that fit in shared buffers, because in that case the system can run fast enough for the spinlocks protecting the lwlocks to get pretty hot. But if you're doing writes, or if the workload doesn't fit in shared buffers, other bottlenecks slow you down enough that this doesn't really seem to become much of an issue. Also, even if you *can* find some scenario where pushing the buffer invalidation into the background is a win, I'm not convinced that would justify doing it, because the case where it's a huge loss - namely, working set just a tiny bit smaller than shared_buffers - is pretty obvious. I don't think we dare fool around with that; the townspeople will arrive with pitchforks. I believe that the big win here is getting the clock sweep out of the foreground so that BufFreelistLock doesn't catch fire. The buffer mapping locks are partitioned and, while it's not like that completely gets rid of the contention, it sure does help a lot. So I would view that goal as primary, at least for now. If we get a first round of optimization done in this area, that doesn't preclude further improving it in the future. > Last time following tests have been executed to validate the results: > > Test suite - pgbench > DB Size - 16 GB > RAM - 24 GB > Shared Buffers - 2G, 5G, 7G, 10G > Concurrency - 8, 16, 32, 64 clients > Pre-warm the buffers before start of test > > Shall we try for any other scenario's or for initial test of patch above are > okay. Seems like a reasonable place to start. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers