On Wed, Aug 3, 2011 at 6:21 PM, Jim Nasby <j...@nasby.net> wrote: > On Aug 3, 2011, at 1:21 PM, Robert Haas wrote: >> 1. "We configure PostgreSQL to use a 2 Gbyte application-level cache >> because PostgreSQL protects its free-list with a single lock and thus >> scales poorly with smaller caches." This is a complaint about >> BufFreeList lock which, in fact, I've seen as a huge point of >> contention on some workloads. In fact, on read-only workloads, with >> my lazy vxid lock patch applied, this is, I believe, the only >> remaining unpartitioned LWLock that is ever taken in exclusive mode; >> or at least the only one that's taken anywhere near often enough to >> matter. I think we're going to do something about this, although I >> don't have a specific idea in mind at the moment. > > This has been discussed before: > http://archives.postgresql.org/pgsql-hackers/2011-03/msg01406.php (which > itself references 2 other threads). > > The basic idea is: have a background process that proactively moves buffers > onto the free list so that backends should normally never have to run the > clock sweep (which is rather expensive). The challenge there is figuring out > how to get stuff onto the free list with minimal locking impact. I think one > possible option would be to put the freelist under it's own lock (IIRC we > currently use it to protect the clock sweep as well). Of course, that still > means the free list lock could be a point of contention, but presumably it's > far faster to add or remove something from the list than it is to run the > clock sweep.
Based on recent benchmarking, I'm going to say "no". It doesn't seem to matter how short you make the critical section: a single program-wide mutex is a loser. Furthermore, the "free list" is a joke, because it's nearly always going to be completely empty. We could probably just rip that out and use the clock sweep and never miss it, but I doubt it would improve performance much. I think what we probably need to do is have multiple clock sweeps in progress at the same time. So, for example, if you have 8GB of shared_buffers, you might have 8 mutexes, one for each GB. When a process wants a buffer, it locks one of the mutexes and sweeps through that 1GB partition. If it finds a buffer before returning to the point at which it started the scan, it's done. Otherwise, it releases its mutex, grabs the next one, and continues on until it finds a free buffer. The trick with any modification in this area is that pretty much any degree of increased parallelism is potentially going to reduce the quality of buffer replacement to some degree. So the trick will be to try to squeeze out as much concurrency as possible while minimizing degradation in the quality of buffer replacements. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers