On 03/04/18 19:43, Andres Freund wrote:
Architecture manual time?  They're available freely IIRC and should
answer this.

Yeah. The best reference I could find was "ARM Cortex-A Series Programmer’s Guide for ARMv8-A" (http://infocenter.arm.com/help/topic/com.arm.doc.den0024a/ch08s01.html). In the "Porting to A64" section, it says:

Data and code must be aligned to appropriate boundaries. The
alignment of accesses can affect performance on ARM cores and can
represent a portability problem when moving code from an earlier
architecture to ARMv8-A. It is worth being aware of alignment issues
for performance reasons, or when porting code that makes assumptions
about pointers or 32-bit and 64-bit integer variables.

I was a bit surprised by the "must be aligned to appropriate boundaries" statement. Googling around, the strict alignment requirement was removed in ARMv7, and since then, unaligned access works similarly to Intel. I think there are some special instructions, like atomic ops, that require alignment though. Perhaps that's what that sentence refers to.

On 03/04/18 20:47, Tom Lane wrote:
I'm pretty sure that some ARM platforms emulate unaligned access through
kernel trap handlers, which would certainly make this a lot slower than
handling the unaligned bytes manually.  Maybe that doesn't apply to any
ARM CPU that has this instruction ... but as you said, it'd be better
to consider the presence of the instruction as orthogonal to other
CPU features.

I did some quick testing, and found that unaligned access is about 2x slower than aligned. I don't think it's being trapped by the kernel, I think that would be even slower, but clearly there is an effect there. So I added code to process the first 1-7 bytes separately, so that the main loop runs on 8-byte aligned addresses.

Pushed, thanks everyone!

- Heikki

Reply via email to