On 1 May 2013 16:33, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, May 1, 2013 at 11:29 AM, Robert Haas <robertmh...@gmail.com> wrote: >>> I was worried because SyncOneBuffer checks whether it needs writing >>> without taking a content lock, so the exclusive lock doesn't help. That >>> makes sense, because you don't want a checkpoint to have to get a >>> content lock on every buffer in the buffer pool. But it also means we >>> need to follow the rules laid out in transam/README and dirty the pages >>> before writing WAL. >> >> On further review, I think you're right about this. We'd have a >> problem if the following sequence of events were to occur: >> >> 1. vacuum writes the WAL record, but does not yet mark the buffer >> dirty, and then goes into the tank >> 2. checkpointer decides where the checkpoint will begin >> 3. buffer pool is scanned as part of the checkpoint process, observing >> target buffer not to be dirty >> 4. vacuum finally wakes up and marks the buffer dirty >> 5. crash, after WAL is flushed but before buffer is written >> >> However, on looking at this a little more, shouldn't we be doing >> log_heap_clean() *before* we do visibilitymap_set()? > > Hit send too soon. > > To finish that thought: right now the order of operations is... > > 1. Perform the cleanup operations on the buffer. > 2. Set the visibility map bit. > 3. Log the visibility map change. > 4. Log the cleanup operations. > > It seems to me that it would be better to do (4) as close to (1) as > possible. Isn't it bad if the operations are replayed in an order > that differs from the order in which they were performed - or if, say, > the first WAL record were replayed but the second were not, for > whatever reason?
I agree, but that was in the original coding wasn't it? Why aren't we writing just one WAL record for this action? We use a single WAL record in other places where we make changes to multiple blocks with multiple full page writes, e.g. index block split. That would make the action atomic and we'd just have this... 1. Perform the cleanup operations on the buffer. 2. Set the visibility map bit. 3. Log the cleanup operations and visibility map change. which can then be replayed with correct sequence, locking etc. and AFAICS would likely be faster also. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers