http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59539

--- Comment #2 from Thiago Macieira <thiago at kde dot org> ---
I have to use _mm_loadu_si128 because non-VEX SSE requires explicit unaligned
loads.

Here's more food for thought:

    __m128i result = _mm_cmpeq_epi16((__m128i*)p1, (__m128i*)p2);

For non-VEX code, so far the compiler emitted one MOVDQA and one PCMPEQW if it
could, enforcing that both sources needed to be aligned. With VEX, VPCMPEQW can
do unaligned, so should the other load also be changed to VPMOVDQU instead of
VPMOVDQA?

Similarly, if I use _mm_load_si128 (not loadu), can the compiler combine one
load into the next instruction? Performance-wise, the execution will be the
same, with one fewer instruction to be retired (so, better); but it will not
cause an unaligned fault if the pointer isn't aligned.

Reply via email to