On Sat, Aug 6, 2011 at 3:30 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.ja...@gmail.com> writes: >> My experiments have shown that the freelist proper is not >> substantially faster than the freelist clocksweep--and that is even >> under the assumption that putting things back into the freelist is >> absolutely free. > > The freelist isn't there to make buffer allocation faster, though; > it's there for allocation efficiency. The point is that when some > buffers have become completely useless (eg, because we dropped the table > they were for), they'll be recycled in preference to reclaiming buffers > that contain still-possibly-useful data. It would certainly be simple > to get rid of the freelist and only recycle dead buffers when the clock > sweep reaches them, but I think we'd be paying for that in extra, > unnecessary I/O.
This is an interesting point, because I think it really gets at the heart of the trade-off we're facing here: quality of buffer allocation vs. concurrency. Assuming no free buffers are present, we'd ideally like to evict the buffer that won't be used until the latest possible future time. LRU is an approximation of this. Our clock-sweep algorithm is a less-accurate approximation of this that pays for itself with less locking. If we really wanted true LRU, we'd have to move things around in the list every time a buffer was pinned. That would be really expensive. Our clock sweep algorithm only requires fighting over a piece of shared state when we want to kick a buffer out, rather than every time we want to do anything to a buffer. And it's probably only slightly worse in terms of buffer allocation efficiency than true LRU, so on balance it appears to be a good trade-off. Even so, I think it's pretty trivial to construct a workload where performance is throttled by BufFreelistLock. I seriously doubt we're going to be able to solve that problem by optimizing the code that runs while holding that lock. We might by ourselves a few more percentage points here or there, but if we really want to bust this bottleneck we're going to need to move to some algorithm that is an even-less-perfect approximation of LRU but which doesn't involve a single cluster-wide lock that throttles all buffer allocation. No matter how fast you make the critical section protected by that lock, given enough CPUs, it will eventually not be fast enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers