Re: Optimize Arm64 crc32c implementation in Postgresql

Heikki Linnakangas Wed, 04 Apr 2018 02:24:44 -0700

On 03/04/18 19:43, Andres Freund wrote:

Architecture manual time?  They're available freely IIRC and should
answer this.

Yeah. The best reference I could find was "ARM Cortex-A SeriesProgrammer’s Guide for ARMv8-A"(http://infocenter.arm.com/help/topic/com.arm.doc.den0024a/ch08s01.html).In the "Porting to A64" section, it says:

Data and code must be aligned to appropriate boundaries. The
alignment of accesses can affect performance on ARM cores and can
represent a portability problem when moving code from an earlier
architecture to ARMv8-A. It is worth being aware of alignment issues
for performance reasons, or when porting code that makes assumptions
about pointers or 32-bit and 64-bit integer variables.

I was a bit surprised by the "must be aligned to appropriate boundaries"statement. Googling around, the strict alignment requirement was removedin ARMv7, and since then, unaligned access works similarly to Intel. Ithink there are some special instructions, like atomic ops, that requirealignment though. Perhaps that's what that sentence refers to.


On 03/04/18 20:47, Tom Lane wrote:

I'm pretty sure that some ARM platforms emulate unaligned access through
kernel trap handlers, which would certainly make this a lot slower than
handling the unaligned bytes manually.  Maybe that doesn't apply to any
ARM CPU that has this instruction ... but as you said, it'd be better
to consider the presence of the instruction as orthogonal to other
CPU features.

I did some quick testing, and found that unaligned access is about 2xslower than aligned. I don't think it's being trapped by the kernel, Ithink that would be even slower, but clearly there is an effect there.So I added code to process the first 1-7 bytes separately, so that themain loop runs on 8-byte aligned addresses.


Pushed, thanks everyone!

- Heikki

Re: Optimize Arm64 crc32c implementation in Postgresql

Reply via email to