On Tue, Jun 14, 2022 at 10:30 PM Peter Geoghegan <p...@bowt.ie> wrote: > Basically I think that this is giving up rather a lot. For example, > isn't it possible that we'd have corruption that could be a bug in > either the checksum code, or in recovery? > > I'd feel a lot better about it if there was some sense of both the > costs and the benefits.
I think that, if and when we get TDE, debuggability is likely to be a huge issue. Something will go wrong for someone at some point, and when it does, what they'll have is a supposedly-encrypted page that cannot be decrypted, and it will be totally unclear what has gone wrong. Did the page get corrupted on disk by a random bit flip? Is there a bug in the algorithm? Torn page? As things stand today, when a page gets corrupted, a human being can look at the page and make an educated guess about what has gone wrong and whether PostgreSQL or some other system is to blame, and if it's PostgreSQL, perhaps have some ideas as to where to look for the bug. If the pages are encrypted, that's a lot harder. I think what will happen, depending on the encryption mode, is probably that either (a) the page will decrypt to complete garbage or (b) the page will fail some kind of verification and you won't be able to decrypt it at all. Either way, you won't be able to infer anything about what caused the problem. All you'll know is that something is wrong. That sucks - a lot - and I don't have a lot of good ideas as to what can be done about it. The idea that an encrypted page is unintelligible and that small changes to either the encrypted or unencrypted data should result in large changes to the other is intrinsic to the nature of encryption. It's more or less un-debuggable by design. With extended checksums, I don't think the issues are anywhere near as bad. I'm not deeply opposed to setting a page-level flag but I expect nominal benefits. A human being looking at the page isn't going to have a ton of trouble figuring out whether or not the extended checksum is present unless the page is horribly, horribly garbled, and even if that happens, will debugging that problem really be any worse than debugging a horribly, horribly garbled page today? I don't think so. I likewise expect that pg_filedump could use heuristics to figure out what's going on just by looking at the page, even if no external information is available. You are probably right when you say that there's no need to be so parsimonious with pd_flags space as all that, but I believe that if we did decide to set no bit in pd_flags, whoever maintains pg_filedump these days would not have huge difficulty inventing a suitable heuristic. A page with an extended checksum is basically still an intelligible page, and we shouldn't understate the value of that. -- Robert Haas EDB: http://www.enterprisedb.com