https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286
Bug ID: 80286
Summary: [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't
return a proper 32bits int
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: gregory.hainaut at gmail dot com
Target Milestone: ---
Dear GCC developers,
It seems that G++4.9 introduced an optimization to reduce stack/memory access
that broke _mm_cvtsi128_si32 behavior.
Note: I tested the various GCC version with godbolt.org, I don't know if GCC 7
snapshot is recent or not.
Note2: maybe the issue belong to RTL/tree optimization but I have no clue.
Here a small test case
---------------------8<-----------------------------------
#include <immintrin.h>
__m256i m;
__m128i extract(__m128i minmax)
{
int shift = _mm_cvtsi128_si32(_mm256_castsi256_si128(m));
return _mm_srli_epi16(minmax, shift);
}
--------------------->8-----------------------------------
It will be compiled as 2 following asm intruction (on recent GCC). The issue is
that shift operand is 64 bits. So "shift" must be zero extended to 64 bits.
Typically Clang uses vpmovzxdq
vmovdqa m(%rip), %ymm1
vpsrlw %xmm1, %xmm0, %xmm0
Best Regards,
Gregory