On Mon, Oct 30, 2023 at 01:48:29PM -0700, Jeff Davis wrote: > I assume you are concerned about the call going through a function > pointer? If so, is it possible that setting a flag and then branching > would be better? > > Also, if it's a concern, should we also consider making an inlineable > version of pg_comp_crc32c_sse42()?
I tested pg_waldump -z with 50M 65-byte records for the following implementations on an ARM system: * slicing-by-8 : ~3.08s * proposed patches applied (runtime check) : ~2.44s * only CRC intrinsics implementation compiled : ~2.42s * forced inlining : ~2.38s Avoiding the runtime check produced a 0.8% improvement, and forced inlining produced another 1.7% improvement. In comparison, even the runtime check implementation produced a 20.8% improvement over the slicing-by-8 one. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com