On 20.10.2020 09:24, Michael Paquier wrote:
On Fri, Oct 16, 2020 at 06:02:52PM +0300, Anastasia Lubennikova wrote:
In the current patch, PageIsVerifed called from pgbasebackup simply doesn't
Should we change this too? I don't see any difference.
I considered that, but now that does not seem worth bothering here.

Done.
Thanks for the updated patch.  I have played with dd and some fake
files to check the validity of the zero-case (as long as pd_lsn does
not go wild).  Here are some comments, with an updated patch attached:
- Added a PageIsVerifiedExtended to make this patch back-patchable,
with the original routine keeping its original shape.
Thank you. I always forget about this. Do we have any checklist for such changes, that patch authors and reviewers can use?
- I did not like much to show the WARNING from PageIsVerified() for
the report, and we'd lose some context related to the base backups.
The important part is to know which blocks from which files are found
as problematic.
- Switched the checks to have PageIsVerified() placed first in the
hierarchy, so as we do all the basic validity checks before a look at
the LSN.  This is not really change things logically.
- The meaning of block_retry is also the opposite of what it should be
in the original code?  IMO, the flag should be set to true if we still
are fine to retry a block, and set it to false once the one-time retry
has failed.

Agree.

- The error strings are not really consistent with the project style
in this area.  These are usually not spawned across multiple lines to
ease searches with grep or such.
Same question as above. I don't see this info about formatting in the error message style guide in documentation. Is it mentioned somewhere else?
Anastasia, Michael B, does that look OK to you?
The final patch looks good to me.
NB: This patch fixes only one problem, the zero-page case, as it was
the intention of Michael B to split this part into its own thread.  We
still have, of course, a second problem when it comes to a random LSN
put into the page header which could mask an actual checksum failure
so this only takes care of half the issues.  Having a correct LSN
approximation is a tricky problem IMO, but we could improve things by
having some delta with an extra WAL page on top of GetInsertRecPtr().
And this function cannot be used for base backups taken from
standbys.

We can also read such pages via shared buffers to be 100% sure.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply via email to