On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <pe...@eisentraut.org> wrote:
> My understanding was that the reason for some hesitation about adopting > data checksums was the performance impact. Not the checksumming itself, > but the overhead from hint bit logging. The last time I looked into that, > you could get performance impacts on the order of 5% tps. Maybe that's > acceptable, and you of course can turn it off if you want the extra > performance. But I think this should be discussed in this thread. > Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it on. And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable. I've come across people who have regretted not throwing a -k into their initial initdb, but have not yet come across someone who has the opposite regret. When I did some measurements some time ago, I found numbers much less than 5%, but of course it depends on a lot of factors. About the claim that it's already the de-facto standard. Maybe that is > approximately true for "serious" installations. But AFAICT, the popular > packagings don't enable checksums by default, so there is likely a > significant middle tier between "just trying it out" and serious > production use that don't have it turned on. > I would push back on that "significant" a good bit. The number of Postgres installations in the cloud is very likely to dwarf the total package installations. Maybe not 10 years ago, but now? Maybe someone from Amazon can share some numbers. Not that we have any way to compare against package installs :) But anecdotally the number of people who mention RDS etc. on the various fora has exploded. > For those uses, this change would render pg_upgrade useless for upgrades > from an old instance with default settings to a new instance with default > settings. And then users would either need to re-initdb with checksums > turned back off, or I suppose run pg_checksums on the old instance before > upgrading? This is significant additional complication. > Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle. > And packagers who have built abstractions on top of pg_upgrade (such as > Debian pg_upgradecluster) would also need to implement something to manage > this somehow. > How does it deal with clusters with checksums enabled now? > I'm thinking pg_upgrade could have a mode where it adds the checksum > during the upgrade as it copies the files (essentially a subset > of pg_checksums). I think that would be useful for that middle tier of > users who just want a good default experience. > Hm...might be a bad experience if it forces a switch out of --link mode. Perhaps a warning at the end of pg_upgrade that suggests running pg_checksums on your new cluster if you want to enable checksums? Cheers, Greg