Re: always use runtime checks for CRC-32C instructions

Nathan Bossart Tue, 31 Oct 2023 08:55:36 -0700

On Mon, Oct 30, 2023 at 10:36:01PM -0500, Nathan Bossart wrote:
> I tested pg_waldump -z with 50M 65-byte records for the following
> implementations on an ARM system:
> 
>  * slicing-by-8                                : ~3.08s
>  * proposed patches applied (runtime check)    : ~2.44s
>  * only CRC intrinsics implementation compiled : ~2.42s
>  * forced inlining                             : ~2.38s
> 
> Avoiding the runtime check produced a 0.8% improvement, and forced inlining
> produced another 1.7% improvement.  In comparison, even the runtime check
> implementation produced a 20.8% improvement over the slicing-by-8 one.


After reflecting on these numbers for a bit, I think I'm still inclined to
do $SUBJECT.  I considered the following:

* While it would be nice to gain a couple of percentage points for existing
  hardware, I think we'll still end up doing runtime checks most of the
  time once we add support for newer instructions.

* The performance improvements that the new instructions provide seem
  likely to outweigh these small regressions, especially for workloads with
  larger WAL records [0].

* From my quick scan of a few dozen machines on the buildfarm, it looks
  like the runtime checks are already the norm, so the number of systems
  that would be subject to a regression from v16 to v17 should be pretty
  small, in theory.  And this regression seems to be on the order of 1%
  based on the numbers above.

Do folks think this is reasonable?  Or should we instead try to squeeze
every last drop out of the current implementations by avoiding function
pointers, forcing inlining, etc.?

[0] https://postgr.es/m/20231025014539.GA977906%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: always use runtime checks for CRC-32C instructions

Reply via email to