https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107836
Bug ID: 107836 Summary: x86_64 inline functions -O2/-O3 optimization error Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: czx211355007 at gmail dot com Target Milestone: --- Target: x86_64-linux-gnu Created attachment 53952 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53952&action=edit full assembly for function "matrix_mul" When compiling the following two functions with -O2 or -O3 options, the assembly code generated is wrong. int dot_product(short* a, short* b, int len){ int result; asm("pandn %%mm5,%%mm5;"::); for(int i=0; i < len; i += 4){ asm( "movq %0,%%mm0;" "movq %1,%%mm1;" "pmaddwd %%mm1,%%mm0;" "paddd %%mm0,%%mm5;" : : "m" (a[i]), "m" (b[i]) ); } asm("movq %%mm5, %%mm0;" "psrlq $32,%%mm5;" "paddd %%mm0, %%mm5;" "movd %%mm5,%0;" "emms" :"=r" (result) :); return result; } } void matrix_mul(int d, short a[d][d], short b[d][d], int c[d][d]){ for(int i=0;i<d;i++){ for(int j=0;j<d;j++){ c[i][j] = dot_product(a[i], b[j], d); } } return; } The part of the assembly code for "matrix_mul" where I see an error: 14b5: 0f 6f c5 movq %mm5,%mm0 14b8: 0f 73 d5 20 psrlq $0x20,%mm5 14bc: 0f fe e8 paddd %mm0,%mm5 14bf: 0f 7e eb movd %mm5,%ebx 14c2: 0f 77 emms 14c4: 0f 1f 40 00 nopl 0x0(%rax) 14c8: 4b 8d 34 0e lea (%r14,%r9,1),%rsi 14cc: 4b 8d 4c 05 00 lea 0x0(%r13,%r8,1),%rcx 14d1: 31 ff xor %edi,%edi 14d3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 14d8: 0f df ed pandn %mm5,%mm5 14db: 49 8d 14 3b lea (%r11,%rdi,1),%rdx 14df: 4c 89 c0 mov %r8,%rax 14e2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 14e8: 0f 6f 00 movq (%rax),%mm0 14eb: 0f 6f 0a movq (%rdx),%mm1 Here mm0 and mm5 are used before values are assigned to mm0 and mm1, which leads to a calculation error when using "matrix_mul" to do matrix multiplication. In addition, when using a low optimization level to compile, there is no error and it's able to get correct results.