On Thu, Aug 25, 2016 at 9:40 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
> Hello,
> I was considering changing the implementation of _mm_loadu_pd in x86's
> emmintrin.h to avoid a builtin. Here are 3 versions:
> typedef double __m128d __attribute__ ((__vector_size__ (16),
> __may_alias__));
> typedef double __m128d_u __attribute__ ((__vector_size__ (16),
> __may_alias__, aligned(1)));
> __m128d f (double const *__P)
> {
>   return __builtin_ia32_loadupd (__P);
> }
> __m128d g (double const *__P)
> {
>   return *(__m128d_u*)(__P);
> }
> __m128d h (double const *__P)
> {
>   __m128d __r;
>   __builtin_memcpy (&__r, __P, 16);
>   return __r;
> }
> f is what we have currently. f and g generate the same code. h also
> generates the same code except at -O0 where it is slightly longer.
> (note that I haven't regtested either version yet)
> 1) I don't have any strong preference between g and h, is there a reason to
> pick one over the other? I may have a slight preference for g, which expands
> to
>   __m128d _3;
>   _3 = MEM[(__m128d_u * {ref-all})__P_2(D)];
> while h yields
>   __int128 unsigned _3;
>   _3 = MEM[(char * {ref-all})__P_2(D)];
>   _4 = VIEW_CONVERT_EXPR<vector(2) double>(_3);

I prefer 'g' which is just more natural.  Note that the C language
requires that __P be
aligned to alignof (double)  (not sure what the Intel intrinsic specs
say here), and thus
it doesn't allow arbitrary misalignment.  This means that you could
use a slightly
better aligned type with aligned(alignof(double)).  Or to be
conforming the parameter
should not be double const * but a double type variant with alignment 1 ...

Maybe Intel folks can clarify things here.

Not that it would make very much of a difference I guess.

> 2) Reading Intel's doc for movupd, it says: "If alignment checking is
> enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check
> exception (#AC) may or may not be generated (depending on processor
> implementation) when the operand is not aligned on an 8-byte boundary."
> Since we generate movupd for memcpy even when the alignment is presumably
> only 1 byte, I assume that this alignment-check stuff is not supported by
> gcc?

Huh, never heard of this.  Does this mean that mov_u_XX do alignment-check
exceptions?  I believe this would break almost all code (glibc memcpy, GCC
generated code, etc....).  Thus it would require kernel support, emulating
the unaligned ops to still work (but record them somehow).


> --
> Marc Glisse

Reply via email to