Re: Vector unaligned load/store x86 intrinsics

Richard Biener Fri, 26 Aug 2016 01:52:08 -0700

On Thu, Aug 25, 2016 at 9:40 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
> Hello,
>
> I was considering changing the implementation of _mm_loadu_pd in x86's
> emmintrin.h to avoid a builtin. Here are 3 versions:
>
> typedef double __m128d __attribute__ ((__vector_size__ (16),
> __may_alias__));
> typedef double __m128d_u __attribute__ ((__vector_size__ (16),
> __may_alias__, aligned(1)));
>
> __m128d f (double const *__P)
> {
>   return __builtin_ia32_loadupd (__P);
> }
>
> __m128d g (double const *__P)
> {
>   return *(__m128d_u*)(__P);
> }
>
> __m128d h (double const *__P)
> {
>   __m128d __r;
>   __builtin_memcpy (&__r, __P, 16);
>   return __r;
> }
>
>
> f is what we have currently. f and g generate the same code. h also
> generates the same code except at -O0 where it is slightly longer.
>
> (note that I haven't regtested either version yet)
>
> 1) I don't have any strong preference between g and h, is there a reason to
> pick one over the other? I may have a slight preference for g, which expands
> to
>
>   __m128d _3;
>   _3 = MEM[(__m128d_u * {ref-all})__P_2(D)];
>
> while h yields
>
>   __int128 unsigned _3;
>   _3 = MEM[(char * {ref-all})__P_2(D)];
>   _4 = VIEW_CONVERT_EXPR<vector(2) double>(_3);


I prefer 'g' which is just more natural.  Note that the C language
requires that __P be
aligned to alignof (double)  (not sure what the Intel intrinsic specs
say here), and thus
it doesn't allow arbitrary misalignment.  This means that you could
use a slightly
better aligned type with aligned(alignof(double)).  Or to be
conforming the parameter
should not be double const * but a double type variant with alignment 1 ...

Maybe Intel folks can clarify things here.

Not that it would make very much of a difference I guess.

> 2) Reading Intel's doc for movupd, it says: "If alignment checking is
> enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check
> exception (#AC) may or may not be generated (depending on processor
> implementation) when the operand is not aligned on an 8-byte boundary."
> Since we generate movupd for memcpy even when the alignment is presumably
> only 1 byte, I assume that this alignment-check stuff is not supported by
> gcc?

Huh, never heard of this.  Does this mean that mov_u_XX do alignment-check
exceptions?  I believe this would break almost all code (glibc memcpy, GCC
generated code, etc....).  Thus it would require kernel support, emulating
the unaligned ops to still work (but record them somehow).

Richard.

> --
> Marc Glisse

Re: Vector unaligned load/store x86 intrinsics

Reply via email to