On Mon, Nov 16, 2020 at 1:23 AM Michael Paquier <mich...@paquier.xyz> wrote:
> On Sun, Nov 15, 2020 at 04:37:36PM +0100, Magnus Hagander wrote: > > On Tue, Nov 10, 2020 at 5:44 AM Michael Paquier <mich...@paquier.xyz> > wrote: > >> On Thu, Nov 05, 2020 at 10:57:16AM +0900, Michael Paquier wrote: > >>> I was referring to the patch I sent on this thread that fixes the > >>> detection of a corruption for the zero-only case and where pd_lsn > >>> and/or pg_upper are trashed by a corruption of the page header. Both > >>> cases allow a base backup to complete on HEAD, while sending pages > >>> that could be corrupted, which is wrong. Once you make the page > >>> verification rely only on pd_checksum, as the patch does because the > >>> checksum is the only source of truth in the page header, corrupted > >>> pages are correctly detected, causing pg_basebackup to complain as it > >>> should. However, it has also the risk to cause pg_basebackup to fail > >>> *and* to report as broken pages that are in the process of being > >>> written, depending on how slow a disk is able to finish a 8kB write. > >>> That's a different kind of wrongness, and users have two more reasons > >>> to be pissed. Note that if a page is found as torn we have a > >>> consistent page header, meaning that on HEAD the PageIsNew() and > >>> PageGetLSN() would pass, but the checksum verification would fail as > >>> the contents at the end of the page does not match the checksum. > >> > >> Magnus, as the original committer of 4eb77d5, do you have an opinion > >> to share? > >> > > > > I admit that I at some point lost track of the overlapping threads around > > this, and just figured there was enough different > checksum-involved-people > > on those threads to handle it :) Meaning the short answer is "no, I don't > > really have one at this point". > > > > Slightly longer comment is that it does seem reasonable, but I have not > > read in on all the different issues discussed over the whole thread, so > > take that as a weak-certainty comment. > > Which part are you considering as reasonable? The removal-feature > part on a stable branch or perhaps something else? > I was referring to the latest patch on the thread. But as I said, I have not read up on all the different issues raised in the thread, so take it with a big grain os salt. And I would also echo the previous comment that this code was adapted from what the pgbackrest folks do. As such, it would be good to get a comment from for example David on that -- I don't see any of them having commented after that was mentioned? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>