On Thu, Oct 6, 2011 at 11:33 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>>>> OTOH, x86_64 and i686 targets can also benefit from this change. If >>>>>>> combine can't create more complex address (covered by lea), then it >>>>>>> will simply propagate memory operand back into the add insn. It looks >>>>>>> to me that we can't loose here, so: >>>>>>> >>>>>>> /* Improve address combine. */ >>>>>>> if (code == PLUS && MEM_P (src2)) >>>>>>> src2 = force_reg (mode, src2); >>>>>>> >>>>>>> Any opinions? >>>>>>> >>>>>> >>>>>> It doesn't work with 64bit libstdc++: >>>>> >>>>> Yeah, yeah. ix86_output_mi_thunk has some ... issues. >>>>> >>>>> Please try attached patch that introduces ix86_emit_binop and uses it >>>>> in a bunch of places. >>> >>>> I tried it on GCC. There are no regressions. The bugs are fixed for x32. >>>> Here are size comparison with GCC runtime libraries on ia32, x32 and >>>> x86-64: >>> >>>> 884093 18600 27064 929757 e2fdd old libstdc++.so >>>> 884189 18600 27064 929853 e303d new libs/libstdc++.so >>>> >>>> The new code is >>>> >>>> mov 0xc(%edi),%eax >>>> mov %eax,0x8(%esi) >>>> mov -0xc(%eax),%eax >>>> mov 0x10(%edi),%edx >>>> lea 0x8(%esi,%eax,1),%eax >>>> >>>> The old one is >>>> >>>> mov 0xc(%edi),%edx >>>> lea 0x8(%esi),%eax >>>> mov %edx,0x8(%esi) >>>> add -0xc(%edx),%eax >>>> mov 0x10(%edi),%edx >>> >>> The new code merged lea+add into one lea, so it looks quite OK to me. >>> >>> Do you have some performance numbers? >>> >> >> I will report performance numbers in a few days. > > The differences in SPEC CPU 2006 on ia32, x86-64 and > x32 are within noise range. Great. Attached is a slightly updated patch, where we consider only integer-mode PLUS RTXes. 2011-10-07 Uros Bizjak <ubiz...@gmail.com> H.J. Lu <hongjiu...@intel.com> PR target/50603 * config/i386/i386.c (ix86_fixup_binary_operands): Force src2 of integer PLUS RTX to a register to improve address combine. testsuite/ChangeLog: 2011-10-07 Uros Bizjak <ubiz...@gmail.com> H.J. Lu <hongjiu...@intel.com> PR target/50603 * gcc.target/i386/pr50603.c: New test. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros.
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 179645) +++ config/i386/i386.c (working copy) @@ -15798,6 +15798,12 @@ ix86_fixup_binary_operands (enum rtx_code code, en if (MEM_P (src1) && !rtx_equal_p (dst, src1)) src1 = force_reg (mode, src1); + /* Improve address combine. */ + if (code == PLUS + && GET_MODE_CLASS (mode) == MODE_INT + && MEM_P (src2)) + src2 = force_reg (mode, src2); + operands[1] = src1; operands[2] = src2; return dst; Index: testsuite/gcc.target/i386/pr50603.c =================================================================== --- testsuite/gcc.target/i386/pr50603.c (revision 0) +++ testsuite/gcc.target/i386/pr50603.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +extern int *foo; + +int +bar (int x) +{ + return foo[x]; +} +/* { dg-final { scan-assembler-not "lea\[lq\]" } } */