https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651
Bug ID: 92651 Summary: [10 Regression] Unnecessary stv transform in some x86 backend Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wwwhhhyyy333 at gmail dot com CC: rguenther at suse dot de Target Milestone: --- For test case #include <math.h> int foo(unsigned char a, unsigned char b) { int isum=abs(a - b); return isum; } Using -O2 -march=corei7 GCC generates: movzx edi, dil movzx esi, sil movd xmm1, edi movd xmm0, esi movdqa xmm3, xmm1 psubd xmm3, xmm0 psubd xmm0, xmm1 pmaxsd xmm0, xmm3 movd eax, xmm0 ret while on -O2 -march=x86-64 it will be: movzx eax, dil movzx esi, sil sub eax, esi cdq xor eax, edx sub eax, edx ret On other case using -O2 -march=corei7 -mtune=generic: movzx edi, dil movzx esi, sil mov eax, edi sub eax, esi sub esi, edi cmp eax, esi cmovl eax, esi ret This happens since r277481. (Refers to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154). In STV2 the transform was executed since the sse_to_integer RTL cost for corei7 is 2, which made the conversion worthwhile for some cmove instructions. I think it affects most IA processors with such kind of cost. The stv conversion results in about 7% regression on 525.x264_r. I wonder if the conversion is designed on purpose to handle cmove, if not I think it is better to adjust sse_to_integer RTL cost to avoid such issue. According to my experiment, 6 would be a proper value.