On Fri, Jun 10, 2022 at 9:36 AM Peter Eisentraut <peter.eisentr...@enterprisedb.com> wrote: > I think there ought to be a bit more principled analysis here than just > "let's add a lot more bits". There is probably some kind of information > to be had about how many CRC bits are useful for a given block size, say. > > And then there is the question of performance. When data checksum were > first added, there was a lot of concern about that. CRC is usually > baked directly into hardware, so it's about as cheap as we can hope for. > SHA not so much.
That's all pretty fair. I have to admit that SHA checksums sound quite expensive, and also that I'm no expert on what kinds of checksums would be best for this sort of application. Based on the earlier discussions around TDE, I do think that people want tamper-resistant checksums here too -- like maybe something where you can't recompute the checksum without access to some secret. I could propose naive ways to do that, like prepending a fixed chunk of secret bytes to the beginning of every block and then running SHA512 or something over the result, but I'm sure that people with actual knowledge of cryptography have developed much better and more robust ways of doing this sort of thing. I've really been devoting most of my mental energy here to understanding what problems there are at the PostgreSQL level - i.e. when we carve out bytes for a wider checksum, what breaks? The only research that I did to try to understand what algorithms might make sense was a quick Google search, which led me to the list of algorithms that btrfs uses. I figured that was a good starting point because, like a filesystem, we're encrypting fixed-size blocks of data. However, I didn't intend to present the results of that quick look as the definitive answer to the question of what might make sense for PostgreSQL, and would be interested in hearing what you or anyone else thinks about that. -- Robert Haas EDB: http://www.enterprisedb.com