Thanks for the review. I have committed the patches. On Thu, Mar 2, 2023 at 2:59 AM Michael Paquier <mich...@paquier.xyz> wrote: > Seems right, I think that you should backpatch that as > VERIFY_CHECKSUMS is the default.
Done. > There is more to it: the page LSN is checked before its checksum. > Hence, if the page's LSN is corrupted in a way that it is higher than > sink->bbs_state->startptr, the checksum verification is just skipped > while the page is broken but not reported as such. (Spoiler: this has > been mentioned in the past, and maybe we'd better remove this stuff in > its entirety.) Yep. It's pretty embarrassing that we have a checksum verification feature that has both known ways of producing false positives and known ways of producing false negatives and we have no plan to ever fix that, we're just going to keep shipping what we've got. I think it's pretty clear that the feature shouldn't have been committed like this; valid criticisms of the design were offered and simply not addressed, not even by updating the comments or documentation with disclaimers. I find the code in sendFile() pretty ugly, too. For all of that, I'm a bit uncertain whether ripping it out is the right thing to do. It might be (I'm not sure) that it tends to work decently well in practice. Like, yes, it could produce false checksum warnings, but does that actually happen to people? It's probably not too likely that a read() or write() of 8kB gets updated after doing only part of the I/O, so the concurrent read or write is fairly likely to be on-CPU, I would guess, and therefore this algorithm might kind of work OK in practice despite begin garbage on a theoretical level. Similarly, the problems with how the LSN is vetted make it likely that a page replaced with random garbage will go undetected, but a page where a few bytes get flipped in a random position within the page is likely to get caught, and maybe the latter is actually a bigger risk than the former. I don't really know. I'd be interested to hear from anyone with a lot of practical experience using the feature. A few anecdotes of the form "this routine fails to tell us about problems" or "this often complains about problems that are not real" or "this has really helped us out on a number of occasions and we have had no problems with it" would be really helpful. On the other hand, we could just say that the code is nonsense and therefore, regardless of practical experience, it ought to be removed. I'm somewhat open to that idea, too. -- Robert Haas EDB: http://www.enterprisedb.com