Re: Vector unaligned load/store x86 intrinsics

Marc Glisse Fri, 26 Aug 2016 02:40:11 -0700

On Fri, 26 Aug 2016, Richard Biener wrote:

On Thu, Aug 25, 2016 at 9:40 PM, Marc Glisse <marc.gli...@inria.fr> wrote:

Hello,


I was considering changing the implementation of _mm_loadu_pd in x86's
emmintrin.h to avoid a builtin. Here are 3 versions:

typedef double __m128d __attribute__ ((__vector_size__ (16),
__may_alias__));
typedef double __m128d_u __attribute__ ((__vector_size__ (16),
__may_alias__, aligned(1)));

__m128d f (double const *__P)
{
  return __builtin_ia32_loadupd (__P);
}

__m128d g (double const *__P)
{
  return *(__m128d_u*)(__P);
}

__m128d h (double const *__P)
{
  __m128d __r;
  __builtin_memcpy (&__r, __P, 16);
  return __r;
}


f is what we have currently. f and g generate the same code. h also
generates the same code except at -O0 where it is slightly longer.

(note that I haven't regtested either version yet)

1) I don't have any strong preference between g and h, is there a reason to
pick one over the other? I may have a slight preference for g, which expands
to

  __m128d _3;
  _3 = MEM[(__m128d_u * {ref-all})__P_2(D)];

while h yields

  __int128 unsigned _3;
  _3 = MEM[(char * {ref-all})__P_2(D)];
  _4 = VIEW_CONVERT_EXPR<vector(2) double>(_3);


I prefer 'g' which is just more natural.


Ok, thanks.

Note that the C language requires that __P be aligned to alignof(double) (not sure what the Intel intrinsic specs say here), and thusit doesn't allow arbitrary misalignment. This means that you could usea slightly better aligned type with aligned(alignof(double)).

I had thought about it, but since we already generate movupd withaligned(1), it didn't really seem worth the trouble for this prototype.

Or to be conforming the parameter should not be double const * but adouble type variant with alignment 1 ...


Yeah, those intrinsics have issues:

__m128i _mm_loadu_si128 (__m128i const* mem_addr)
"mem_addr does not need to be aligned on any particular boundary."

that doesn't really make sense.

I may try to experiment with your suggestion, see if it breaks anything.Gcc seems happy to ignore those alignment differences when castingfunction pointers, so it should be fine.

2) Reading Intel's doc for movupd, it says: "If alignment checking is
enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check
exception (#AC) may or may not be generated (depending on processor
implementation) when the operand is not aligned on an 8-byte boundary."
Since we generate movupd for memcpy even when the alignment is presumably
only 1 byte, I assume that this alignment-check stuff is not supported by
gcc?


Huh, never heard of this.  Does this mean that mov_u_XX do alignment-check
exceptions?  I believe this would break almost all code (glibc memcpy, GCC
generated code, etc....).  Thus it would require kernel support, emulating
the unaligned ops to still work (but record them somehow).

Elsewhere (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_pd&expand=3106,3115,3106,3124,3106&techs=SSE2) Intel doesn't mention this at all, it just says: "mem_addr does not needto be aligned on any particular boundary." So it might be a provision inthe spec that was added just in case, but never implemented...


--
Marc Glisse

Re: Vector unaligned load/store x86 intrinsics

Reply via email to