On 03/03/2018 05:08 PM, Magnus Hagander wrote: > > > On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra > <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote: > > On 03/03/2018 01:38 PM, Robert Haas wrote: > > On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmh...@gmail.com > <mailto:robertmh...@gmail.com>> wrote: > >> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra > >> <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> > wrote: > >>> Hmmm, OK. So we need to have a valid checksum on a page, disable > >>> checksums, set some hint bits on the page (which won't be > >>> WAL-logged), enable checksums again and still get a valid > >>> checksum even with the new hint bits? That's possible, albeit > >>> unlikely. > >> > >> No, the problem is if - as is much more likely - the checksum is > >> not still valid. > > > > Hmm, on second thought ... maybe I didn't think this through > > carefully enough. If the checksum matches on the master by chance, > > and the page is the same on the standby, then we're fine, right? It's > > a weird accident, but nothing is actually broken. The failure > > scenario is where the standby has a version of the page with a bad > > checksum, but the master has a good checksum. So for example: > > checksums disabled, master modifies the page (which is replicated), > > master sets some hint bits (coincidentally making the checksum > > match), now we try to turn checksums on and don't re-replicate the > > page because the checksum already looks correct. > > > > Yeah. Doesn't that pretty much mean we can't skip any pages that have > correct checksum, because we can't rely on standby having the same page > data? That is, this block in ProcessSingleRelationFork: > > /* > * If checksum was not set or was invalid, mark the buffer as dirty > * and force a full page write. If the checksum was already valid, we > * can leave it since we know that any other process writing the > * buffer will update the checksum. > */ > if (checksum != pagehdr->pd_checksum) > { > START_CRIT_SECTION(); > MarkBufferDirty(buf); > log_newpage_buffer(buf, false); > END_CRIT_SECTION(); > } > > That would mean this optimization - only doing the write when the > checksum does not match - is broken. > > > Yes. I think that was the conclusion of this, as posted > in > https://www.postgresql.org/message-id/CABUevExDZu__5KweT8fr3Ox45YcuvTDEEu%3DaDpGBT8Sk0RQE_g%40mail.gmail.com > :) >
Oh, right. I did have a "deja vu" feeling, when writing that. Good that I came to the same conclusion, though. > > If that's the case, it probably makes restarts/resume more expensive, > because this optimization was why after restart the already processed > data was only read (and the checksums verified) but not written. > > > Yes, it definitely does. It's not a dealbreaker, but it's certainly > a bit painful not to be able to resume as cheap. > Yeah. It probably makes the more elaborate resuming more valuable, but I still think it's not a "must have" for PG11. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services