Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote: > > * Magnus Hagander (mag...@hagander.net) wrote: > >> On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier <mich...@paquier.xyz> > >> wrote: > >>> But here the checksum is broken, so while the offset is something we > >>> can rely on how do you make sure that the LSN is fine? A broken > >>> checksum could perfectly mean that the LSN is actually *not* fine if > >>> the page header got corrupted. > > > > Of course that could be the case, but it gets to be a smaller and > > smaller chance by checking that the LSN read falls within reasonable > > bounds. > > FWIW, I find that scary.
There's ultimately different levels of 'scary' and the risk here that something is actually wrong following these checks strikes me as being on the same order as random bits being flipped in the page and still getting a valid checksum (which is entirely possible with our current checksum...), or maybe even less. Both cases would result in a false negative, which is surely bad, though that strikes me as better than a false positive, where we say there's corruption when there isn't. > And this bit is interesting, because that would give the guarantees > you are looking for with a page retry (just grep for BM_IO_IN_PROGRESS > on the thread): > https://www.postgresql.org/message-id/20201102193457.fc2hoen7ahth4...@alap3.anarazel.de There's no guarantee that the page is still in shared buffers or that we have a buffer descriptor still for it by the time we're doing this, as I said up-thread. This approach requires that we reach into PG, acquire at least a buffer descriptor and set BM_IO_IN_PROGRESS on it and then read the page again and checksum it again before finally looking at the (now 'trusted' LSN, even though it might have had some bits flipped in it and we wouldn't know..) and see if it's higher than the start of the backup, and maybe less than the current LSN. Maybe we can avoid actually pulling the page into shared buffers (reading it into our own memory instead) and just have the buffer descriptor but none of this seems like it's going to be very unobtrusive in either code or the running system, and it isn't going to give us an actual guarantee that there's been no corruption. The amount that it improves on the checks that I outline above seems to be exceedingly small and the question is if it's worth it for, most likely, exclusively pg_basebackup (unless we're going to figure out a way to expose this via SQL, which seems unlikely). > > One idea that has occured > > to me which hasn't been discussed is checking the file's mtime to see if > > it's changed since the backup started. In that case, I would think it'd > > be something like: > > > > - Checksum is invalid > > - LSN is within range > > - Close file > > - Stat file > > - If mtime is from before the backup then signal possible corruption > > I suspect that relying on mtime may cause problems. One case coming > to my mind is NFS. I agree that it might not be perfect but it also seems like something which could be reasonably cheaply checked and the window (between when the backup started and the time we hit this torn page) is very likely to be large enough that the mtime will have been updated and be different (and forward, if it was modified) of what it was at the time the backup started. It's also something that incremental backups may be looking at, so if there's serious problems with it then there's a good chance you've got bigger issues. > > In general, however, I don't like the idea of reaching into PG and > > asking PG for this page. > > It seems to me that if we don't ask to PG what it thinks about a page, > we will never have a fully bullet-proof design either. None of this is bullet-proof, it's all trade-offs. Thanks, Stephen
signature.asc
Description: PGP signature