Hi, On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote: > On 3/22/19 5:10 PM, Andres Freund wrote: > > IDK, being able to verify in some form that backups aren't corrupted on > > an IO level is mighty nice. That often does allow to detect the issue > > while one still has older backups around. > > > > Yeah, I agree that's a valuable capability. I think the question is how > effective it actually is considering how much the storage changed over > the past few years (which necessarily affects the type of failures > people have to deal with).
I'm not sure I understand? How do the changes around storage meaningfully affect the need to have some trust in backups and benefiting from earlier detection? > It's not clear to me what can checksums do about zeroed pages (and/or > truncated files) though. Well, there's nothing fundamental about needing added pages be zeroes. We could expand them to be initialized with actual valid checksums instead of /* new buffers are zero-filled */ MemSet((char *) bufBlock, 0, BLCKSZ); /* don't set checksum for all-zero page */ smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false); the problem is that it's hard to do so safely without adding a lot of additional WAL logging. A lot of filesystems will journal metadata changes (like the size of the file), but not contents. So after a crash the tail end might appear zeroed out, even if we never wrote zeroes. That's obviously solvable by WAL logging, but that's not cheap. It might still be a good idea to just write a page with an initialized header / checksum at that point, as that ought to still detect a number of problems we can't detect right now. Greetings, Andres Freund