Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Fri, Nov 20, 2020 at 11:08:27AM -0500, Stephen Frost wrote: > > David Steele (da...@pgmasters.net) wrote: > >> Our current plan for pgBackRest: > >> > >> 1) Remove the LSN check as you have done in your patch and when rechecking > >> see if the page has become valid *or* the LSN is ascending. > >> 2) Check the LSN against the max LSN reported by PostgreSQL to make sure it > >> is valid. > > > > Yup, that's my recollection also as to our plans for how to improve > > things here. > > > >> These do completely rule out any type of corruption, but they certainly > >> narrows the possibility by a lot. > > > > *don't :) > > Have you considered the possibility of only using pd_checksums for the > validation? This is the only source of truth in the page header we > can rely on to validate the full contents of the page, so if the logic > relies on anything but the checksum then you expose the logic to risks > of reporting pages as corrupted while they were just torn, or just > miss corrupted pages, which is what we should avoid for such things. > Both are bad.
There's no doubt that you'll get checksum failures from time to time, and that it's an entirely valid case if the page is being concurrently written, so we have to decide if we should be reporting those failures, retrying, or what. It's not at all clear what you're suggesting here as to how you can use 'only' the checksum. > >> As for your patch, it mostly looks good but my objection is that a page may > >> be reported as invalid after 5 retries when in fact it may just be very > >> hot. > > > > Yeah.. while unlikely that it'd actually get written out that much, it > > does seem at least possible. > > > >> Maybe checking for an ascending LSN is a good idea there as well? At least > >> in that case we could issue a different warning, instead of "checksum > >> verification failed" perhaps "checksum verification skipped due to > >> concurrent modifications". > > > > +1. > > I don't quite understand how you can make sure that the page is not > corrupted here? It could be possible that the last 4kB of a 8kB page > got corrupted, where the header had valid data but failing the > checksum verification. Not sure that the proposed approach was really understood here. Specifically what we're talking about is: - read(), save the LSN seen - calculate checksum- get a failure - re-read(), compare LSN to prior LSN, maybe also re-check checksum If checksum fails again AND the LSN has changed and increased (and perhaps otherwise seems reasonable) then we have at least a bit more confidence that the failing checksum is due to the page being rewritten concurrently and not due to latest storage corruption, which is the specific distinction that we're trying to discern here. > So if you are not careful you could have at > hand a corrupted page discarded because of it failed the retry > multiple times in a row. The point of checking for an ascending LSN is to see if the page is being concurrently modified. If it is, then we actually don't care if the page is corrupted because it's going to be rewritten during WAL replay as part of the restore process. > The only method I can think as being really > reliable is based on two facts: > - Do a check only on pd_checksums, as that validates the full contents > of the page. > - When doing a retry, make sure that there is no concurrent I/O > activity in the shared buffers. This requires an API we don't have > yet. I don't think we actually want the backup process to start locking pages, which it seems like is what you're suggesting here..? Trying to do a check without a lock and without having PG end up reading the page back in if it had been evicted due to pressure seems likely to be hard to do reliably and without race conditions complicating things. The other 100% reliable approach, as David discussed before, is to be scanning the WAL at the same time and to ignore any checksum failures for pages that we know are in the WAL with FPIs. Unfortunately, reading WAL for all different versions of PG is a fair bit of work and we haven't quite gotten to biting that off yet (though it's on the roadmap), and the core code certainly doesn't help us in that regard since any given version only supports the current major version WAL (an issue pg_basebackup would also have to deal with it, were it to be modified to use such an approach and to continue working with older versions of PG..). In a similar vein to what we do (in pgbackrest) with pg_control, we expect to develop our own library basically vendorizing WAL reading code from all the major versions of PG which we support in order to track FPIs, restore points, all the kinds of potential recovery targets, and other useful information. Thanks, Stephen
signature.asc
Description: PGP signature