On Thu, Aug 12, 2004 at 01:13:46PM -0400, Tom Lane wrote: > Kenneth Marshall <[EMAIL PROTECTED]> writes: > > On Thu, Aug 12, 2004 at 09:58:56AM -0400, Tom Lane wrote: > >> How would a read-only action work to block out the checkpoint? > > > The latch+version number is use by the checkpoint process. The > > other processes can do a read of the latch to determine if it has > > been set. This does not cause a cache invalidation hit. If the > > latch is set, the competing processes read until it has been > > cleared and the version updated. This makes the general case of > > no checkpoint not incur a write and the consequent cache-line > > invalidation and reload by all processors on an SMP system. > > Except that reading the latch and finding it clear offers no guarantee > that a checkpoint isn't about to start. The problem is that we are > performing two separate actions (write a COMMIT xlog record and update > transaction status in clog) and we have to prevent a checkpoint from > starting in between those actions. I don't see that there's any way to > do that with a read-only latch. >
...just caught up on this. ISTM that more heavily loading the checkpoint process IS possible if the checkpoint uses a two-phase lock. That would replace 1 write lock with 2 lock reads...which is likely to be beneficial for SMP, given I have faith that the other two problems you mention will succumb to some solution in the mid-term. The first lock is an "intent lock" followed by a second, heavyweight lock just as you now have it. Comitter: 1. prior to COMMIT: reads for an intent lock, if found then it attempts to take heavyweight lock...if that is not possible, then the commit waits until after the checkpoint, just as you currently suggest 2. prior to update clog: reads for an intent lock, if found then takes heavyweight lock...if that is not possible, then report a server error Checkpointer: (straight to step 4 for a shutdown checkpoint) 1. writes an intent lock (it always can) 2. wait for the group commit timeout 3. wait for 0.5 second more 4. begins to wait on an exclusive heavyweight lock, before starting checkpoint proper This is not a provably correct state machine, but the error message should not occur under current "normal" situations. (It is possible that an intent lock could be written by Checkpointer (step 1), after a Committer reads for it (step 1), then a very long delay occurs before Committer's step 2), such that Checkpointer step 4 begins before Committer step 2.) It is very likely that this would be noticed by Comitter step 2 and reported upon, in the unlikely event that it occurs. Is a longer term solution for pg to use a background log writer? That would make group commit much easier to perform automatically without the false-delay model currently available. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org