Hi all, On Tue, Aug 13, 2024 at 10:08 PM Robert Haas <robertmh...@gmail.com> wrote:
> And it's not like we have statistics anywhere that you can look at to > see how much CPU time you spent computing checksums, so if a user DOES > have a performance problem that would not have occurred if checksums > had been disabled, they'll probably never know it. In worst case, per second and per-pid CPU time consumption could be quantified by having eBPF which is the standard on distros now (requires kernel headers and bpfcc-tools installed), e.g. here 7918 was PID doing pgbench-related -c 4 workload with checksum=on (sorry for formatting, but I don't want to use HTML here): # funclatency-bpfcc --microseconds -i 1 -p 7918 /usr/lib/postgresql/16/bin/postgres:pg_checksum_page Tracing 1 functions for "/usr/lib/postgresql/16/bin/postgres:pg_checksum_page"... Hit Ctrl-C to end. usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 238 |************* | 4 -> 7 : 714 |****************************************| 8 -> 15 : 2 | | 16 -> 31 : 5 | | 32 -> 63 : 0 | | 64 -> 127 : 1 | | 128 -> 255 : 0 | | 256 -> 511 : 1 | | 512 -> 1023 : 1 | | avg = 6 usecs, total: 6617 usecs, count: 962 usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 241 |************* | 4 -> 7 : 706 |****************************************| 8 -> 15 : 11 | | 16 -> 31 : 10 | | 32 -> 63 : 1 | | avg = 5 usecs, total: 5639 usecs, count: 969 [..refreshes every 1s here..] So the above can tell us e.g. that this pg_checksum_page() took 5639 us out of 1s full sample time (and with 100% CPU pegged core so that's gives again ~5% CPU util per this routine; I'm ignoring the WAL/log hint impact for sure). One could also write a small script using bpftrace instead, too. Disassembly on Debian version and stock PGDG is telling me it's ful SSE2 instruction-set, so that's nice and optimal too. > >> For those uses, this change would render pg_upgrade useless for upgrades > >> from an old instance with default settings to a new instance with default > >> settings. And then users would either need to re-initdb with checksums > >> turned back off, or I suppose run pg_checksums on the old instance before > >> upgrading? This is significant additional complication. > > Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle. > > I tend to agree with that, but I would also like to see the sort of > improvements that Peter mentions. [..] > None of that is to say that I'm totally hostile to this change. [.,.] > Whether that's worth the overhead for everyone, I'm not quite sure. Without data checksums there's a risk that someone receives silent-bit corruption and no one will notice. Shouldn't integrity of data stand above performance by default, in this case? (and performance could be opt-in, if someone really wants it). -J.