On 5 April 2016 at 16:21, Paolo Bonzini <pbonz...@redhat.com> wrote:
> But in theory it should be enough to add a new #elif branch like this:
>
> #include "arm_neon.h"
> #define VECTYPE   uint64x2_t
> #define VEC_OR(a, b) ((a) | (b))
> #define ALL_EQ(a, b) /* ??? :) */

#define ALL_EQ(a, b) (vgetq_lane_u64(a, 0) == vgetq_lane_u64(b, 0) && \
                      vgetq_lane_u64(a, 1) == vgetq_lane_u64(b, 1))

will do I think (probably suboptimal for a true vector compare but
works OK here as we're actually only interested in comparing against
constant zero; the compiler generates "load 64bit value from vector
register to integer; cbnz" for each half of the vector).

Worth benchmarking that (and the variant where we use the C code
but move the loop unrolling out to 16) against the handwritten
intrinsics version.

thanks
-- PMM

Reply via email to