On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote: > * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: >> It seems reasonable to me to rely on checksums only. >> >> As for retry, I think that API for concurrent I/O will be complicated. >> Instead, we can introduce a function to read the page directly from shared >> buffers after PAGE_RETRY_THRESHOLD attempts. It looks like a bullet-proof >> solution to me. Do you see any possible problems with it?
It seems to me that you are missing the point here. It is not necessary to read a page from shared buffers. What is necessary is to make sure that there is zero concurrent I/O activity in shared buffers while a page is getting checked on disk, giving the insurance that there is zero risk of having a torn page for a check for anything working with shared buffers. You could do that only on a retry if we found a page where there was a checksum mismatch, meaning that the page we either torn or currupted, but need an extra verification anyway. > We might end up reading pages back in that have been evicted, for one > thing, which doesn't seem great, and this also seems likely to be > awkward for cases which aren't using the replication protocol, unless > every process maintains a connection to PG the entire time, which also > doesn't seem great. I don't quite see a problem in checking pages that have been just evicted if we are able to detect faster that a page is corrupted, because the initial check may fail because a page was torn, meaning that it was in the middle of an eviction, but the page could also be corrupted, meaning also that it was *not* torn, and would fail a retry where we should make sure that there is no s_b concurrent activity. So in the worst case of seeing you make the detection of a corrupted page faster. Please note that Andres also mentioned about the potential need to worry about table AMs that call directly smgrwrite(), bypassing shared buffers. The only cases in-core where it is used are related to init forks when an unlogged relation gets created, where it would not matter if you are doing a page check while holding a database transaction as the newly-created relation would not be visible yet, but it would matter in the case of base backups doing direct page lookups. Fun. > Also- what is the point of reading the page from shared buffers > anyway..? All we need to do is prove that the page will be rewritten > during WAL replay. If we can prove that, we don't actually care what > the contents of the page are. We certainly can't calculate the > checksum on a page we plucked out of shared buffers since we only > calculate the checksum when we go to write the page out. A LSN-based check makes the thing tricky. How do you make sure that pd_lsn is not itself broken? It could be perfectly possible that a random on-disk corruption makes pd_lsn seen as having a correct value, still the rest of the page is borked. -- Michael
signature.asc
Description: PGP signature