On Sun, Jan 29, 2012 at 6:04 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Sun, Jan 29, 2012 at 9:41 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > >> If I cast to a int, then I see advancement: > > I'll initialise it as 0, rather than -1 and then we don't have a > problem in any circumstance. > > >>> I've specifically designed the pgbench changes required to simulate >>> conditions of clog contention to help in the evaluation of this patch. >> >> Yep, I've used that one for the testing. > > Most of the current patch is just bookkeeping to keep track of the > point when we can look at history in read only manner. > > I've isolated the code better to allow you to explore various > implementation options. I don't see any performance difference between > any of them really, but you're welcome to look. > > Please everybody note that the clog history doesn't even become active > until the first checkpoint, so this is dead code until we've hit the > first checkpoint cycle and completed a million transactions since > startup. So its designed to tune for real world situations, and is not > easy to benchmark. (Maybe we could start earlier, but having extra > code just for first few minutes seems waste of energy, especially > since we must hit million xids also).
I find that this version does not compile: clog.c: In function ‘TransactionIdGetStatus’: clog.c:431: error: ‘clog’ undeclared (first use in this function) clog.c:431: error: (Each undeclared identifier is reported only once clog.c:431: error: for each function it appears in.) Given that, I obviously cannot test this at this point, but let me go ahead and theorize about how well it's likely to work. What Tom suggested before (and after some reflection I think I believe it) is that the frequency of access will be highest for the newest CLOG page and then drop off for each page further back you go. Clearly, if that drop-off is fast - e.g. each buffer further backward is half as likely to be accessed as the next newer one - then the fraction of accesses that will hit pages that are far enough back to benefit from this optimization will be infinitesmal; 1023 out of every 1024 accesses will hit the first ten pages, and on a high-velocity system those all figure to have been populated since the last checkpoint. The best case for this patch should be an access pattern that involves a very long tail; actually, pgbench is a pretty good fit for that, assuming the scale factor is large enough. For example, at scale factor 100, we've got 10,000,000 tuples: choosing one at random, we're almost exactly 90% likely to find one that hasn't been chosen in the last 1,024,576 tuples (i.e. 32 CLOG pages @ 32K txns/page). In terms of reducing contention on the main CLOG SLRU, that sounds pretty promising, but depends somewhat on the rate at which transactions are processed relative to the frequency of checkpoints, since that will affect how many pages back you have go to use the history path. However, there is a potential fly in the ointment: in other cases in which we've reduced contention at the LWLock layer, we've ended up with very nasty contention at the spinlock layer that can sometimes eat more CPU time than the LWLock contention did. In that light, it strikes me that it would be nice to be able to partition the contention N ways rather than just 2 ways. I think we could do that as follows. Instead of having one control lock per SLRU, have N locks, where N is probably a power of 2. Divide the buffer pool for the SLRU N ways, and decree that each slice of the buffer pool is controlled by one of the N locks. Route all requests for a page P to slice P mod N. Unlike this approach, that wouldn't completely eliminate contention at the LWLock level, but it would reduce it proportional to the number of partitions, and it would reduce spinlock contention according to the number of partitions as well. A down side is that you'll need more buffers to get the same hit rate, but this proposal has the same problem: it doubles the amount of memory allocated for CLOG. Of course, this approach is all vaporware right now, so it's anybody's guess whether it would be better than this if we had code for it. I'm just throwing it out there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers