https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286
Bug ID: 80286 Summary: [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: gregory.hainaut at gmail dot com Target Milestone: --- Dear GCC developers, It seems that G++4.9 introduced an optimization to reduce stack/memory access that broke _mm_cvtsi128_si32 behavior. Note: I tested the various GCC version with godbolt.org, I don't know if GCC 7 snapshot is recent or not. Note2: maybe the issue belong to RTL/tree optimization but I have no clue. Here a small test case ---------------------8<----------------------------------- #include <immintrin.h> __m256i m; __m128i extract(__m128i minmax) { int shift = _mm_cvtsi128_si32(_mm256_castsi256_si128(m)); return _mm_srli_epi16(minmax, shift); } --------------------->8----------------------------------- It will be compiled as 2 following asm intruction (on recent GCC). The issue is that shift operand is 64 bits. So "shift" must be zero extended to 64 bits. Typically Clang uses vpmovzxdq vmovdqa m(%rip), %ymm1 vpsrlw %xmm1, %xmm0, %xmm0 Best Regards, Gregory