On 08/24/2016 12:18 PM, Eric Blake wrote:
On 08/24/2016 12:48 PM, Richard Henderson wrote:
Patches 1-4 remove the use of ifunc from the implementation.
Patch 6 adjusts the x86 implementation a bit more to take
advantage of ptest (in sse4.1) and unaligned accesses (in avx1).
Do we really care about unaligned access? Or can we guarantee that all
our calls to buffer_is_zero are already aligned, and make optimizations
along those lines?
The old code asserted alignment of at least sizeof(long), although a survey of
call sites doesn't make this obvious. I could imagine that we get alignment
consistent with that of malloc, but can't prove it.
However, we're certainly not going to be able to assert arbitrary alignment,
such as the 32-byte for AVX2, or the 64-byte for AVX512 (when that comes along).
Thankfully, at least AVX capable cpus are very efficient with unaligned
accesses.
r~