https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71903
Bug ID: 71903 Summary: Wrong opcode using x86 SSE _mm_cmpge_ps intrinsics Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: carlosrafael.prog at gmail dot com Target Milestone: --- I have the following code: float *previousM = ...; float *fft = ...; for (int32_t i = 0; i < 256; i += 8) { __m128 m0 = _mm_load_ps(previousM); __m128 m1 = _mm_load_ps(previousM + 4); previousM += 8; __m128 old0 = _mm_load_ps(fft); __m128 old1 = _mm_load_ps(fft + 4); __m128 geq0 = _mm_cmpge_ps(m0, old0); __m128 geq1 = _mm_cmpge_ps(m1, old1); ... } Since the code was behaving rather strangely, I decided to generate and read its disassembly (below is the snippet that drew my attention): extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_cmpge_ps (__m128 __A, __m128 __B) { return (__m128) __builtin_ia32_cmpgeps ((__v4sf)__A, (__v4sf)__B); 9f: 0f c2 dd 02 cmpleps %xmm5,%xmm3 Please, notice that this is not a bug in the disassembler because Intel docs state that CMPLEPS xmm1, xmm2 becomes CMPPS xmm1, xmm2, 2 Also, this is not some weird optimization or anything else, because even if the compiler had decided to switch m0 with old0, the opposite of >= (ge) is < (lt) and not <= (le), as the disassembly shows. In order to make the code work properly, I manually replaced these two lines in my code __m128 geq0 = _mm_cmpge_ps(m0, old0); __m128 geq1 = _mm_cmpge_ps(m1, old1); with these two lines __m128 geq0 = _mm_cmplt_ps(old0, m0); __m128 geq1 = _mm_cmplt_ps(old1, m1); After that change, the disassembly became extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_cmplt_ps (__m128 __A, __m128 __B) { return (__m128) __builtin_ia32_cmpltps ((__v4sf)__A, (__v4sf)__B); 8d: 0f c2 e3 01 cmpltps %xmm3,%xmm4 Just as an extra piece of information: - I am using the gcc bundled with Android build tools, and since there are two executable files, I do not know for sure if the version of the gcc being used is "4.8" or "4.9 20140827" - I am compiling under a 64-bit Windows 10, targeting a 32-bit x86 Android app - The gcc used (both 4.8 and 4.9) are inside the folder windows-x86_64 (which makes me believe I am using a 64-bit version of gcc)