On 5 April 2016 at 16:21, Paolo Bonzini <pbonz...@redhat.com> wrote: > But in theory it should be enough to add a new #elif branch like this: > > #include "arm_neon.h" > #define VECTYPE uint64x2_t > #define VEC_OR(a, b) ((a) | (b)) > #define ALL_EQ(a, b) /* ??? :) */
#define ALL_EQ(a, b) (vgetq_lane_u64(a, 0) == vgetq_lane_u64(b, 0) && \ vgetq_lane_u64(a, 1) == vgetq_lane_u64(b, 1)) will do I think (probably suboptimal for a true vector compare but works OK here as we're actually only interested in comparing against constant zero; the compiler generates "load 64bit value from vector register to integer; cbnz" for each half of the vector). Worth benchmarking that (and the variant where we use the C code but move the loop unrolling out to 16) against the handwritten intrinsics version. thanks -- PMM