Hi Michael,

On 11/23/20 8:10 PM, Michael Paquier wrote:
On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote:

Also- what is the point of reading the page from shared buffers
anyway..?  All we need to do is prove that the page will be rewritten
during WAL replay.  If we can prove that, we don't actually care what
the contents of the page are.  We certainly can't calculate the
checksum on a page we plucked out of shared buffers since we only
calculate the checksum when we go to write the page out.

A LSN-based check makes the thing tricky.  How do you make sure that
pd_lsn is not itself broken?  It could be perfectly possible that a
random on-disk corruption makes pd_lsn seen as having a correct value,
still the rest of the page is borked.

We are not just looking at one LSN value. Here are the steps we are proposing (I'll skip checks for zero pages here):

1) Test the page checksum. If it passes the page is OK.
2) If the checksum does not pass then record the page offset and LSN and continue. 3) After the file is copied, reopen and reread the file, seeking to offsets where possible invalid pages were recorded in the first pass.
    a) If the page is now valid then it is OK.
b) If the page is not valid but the LSN has increased from the LSN recorded in the previous pass then it is OK. We can infer this because the LSN has been updated in a way that is not consistent with storage corruption.

This is what we are planning for the first round of improving our page checksum validation. We believe that doing the retry in a second pass will be faster and more reliable because some time will have passed since the first read without having to build in a delay for each page error.

A further improvement is to check the ascending LSNs found in 3b against PostgreSQL to be completely sure they are valid. We are planning this for our second round of improvements.

Reopening the file for the second pass does require some additional logic:

1) The file may have been deleted by PG since the first pass and in that case we won't report any page errors. 2) The file may have been truncated by PG since the first pass so we won't report any errors past the point of truncation.

A malicious attacker could easily trick these checks, but as Stephen pointed out elsewhere they would likely make the checksums valid which would escape detection anyway.

We believe that the chances of random storage corruption passing all these checks is incredibly small, but eventually we'll also check against the WAL to be completely sure.

Regards,
--
-David
da...@pgmasters.net


Reply via email to