On 4 April 2013 02:39, Andres Freund <and...@2ndquadrant.com> wrote: > Ok, I think I see the bug. And I think its been introduced in the > checkpoints patch.
Well spotted. (I think you mean checksums patch). > If by now the first backend has proceeded to PageSetLSN() we are writing > different data to disk than the one we computed the checksum of > before. Boom. Right, so nothing else we were doing was wrong, that's why we couldn't spot a bug. The problem is that we aren't replaying enough WAL because the checksum on the WAL record is broke. > I think the whole locking interactions in MarkBufferDirtyHint() need to > be thought over pretty carefully. When we write out a buffer with checksums enabled, we take a copy of the buffer so that the checksum is consistent, even while other backends may be writing hints to the same bufer. I missed out on doing that with XLOG_HINT records, so the WAL CRC can be incorrect because the data is scanned twice; normally that would be OK because we have an exclusive lock on the block, but with hints we only have share lock. So what we need to do is take a copy of the buffer before we do XLogInsert(). Simple patch to do this attached for discussion. (Not tested). We might also do this by modifying the WAL record to take the whole block and bypass the BkpBlock mechanism entirely. But that's more work and doesn't seem like it would be any cleaner. I figure lets solve the problem first then discuss which approach is best. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
copy_before_XLOG_HINT.v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers