On Mon, Oct 30, 2023 at 10:36:01PM -0500, Nathan Bossart wrote: > I tested pg_waldump -z with 50M 65-byte records for the following > implementations on an ARM system: > > * slicing-by-8 : ~3.08s > * proposed patches applied (runtime check) : ~2.44s > * only CRC intrinsics implementation compiled : ~2.42s > * forced inlining : ~2.38s > > Avoiding the runtime check produced a 0.8% improvement, and forced inlining > produced another 1.7% improvement. In comparison, even the runtime check > implementation produced a 20.8% improvement over the slicing-by-8 one.
After reflecting on these numbers for a bit, I think I'm still inclined to do $SUBJECT. I considered the following: * While it would be nice to gain a couple of percentage points for existing hardware, I think we'll still end up doing runtime checks most of the time once we add support for newer instructions. * The performance improvements that the new instructions provide seem likely to outweigh these small regressions, especially for workloads with larger WAL records [0]. * From my quick scan of a few dozen machines on the buildfarm, it looks like the runtime checks are already the norm, so the number of systems that would be subject to a regression from v16 to v17 should be pretty small, in theory. And this regression seems to be on the order of 1% based on the numbers above. Do folks think this is reasonable? Or should we instead try to squeeze every last drop out of the current implementations by avoiding function pointers, forcing inlining, etc.? [0] https://postgr.es/m/20231025014539.GA977906%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com