On Tue, Oct 4, 2011 at 11:58 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Tue, Oct 4, 2011 at 11:51 AM, Uros Bizjak <ubiz...@gmail.com> wrote: >> On Tue, Oct 4, 2011 at 8:37 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> >>>>>> OTOH, x86_64 and i686 targets can also benefit from this change. If >>>>>> combine can't create more complex address (covered by lea), then it >>>>>> will simply propagate memory operand back into the add insn. It looks >>>>>> to me that we can't loose here, so: >>>>>> >>>>>> /* Improve address combine. */ >>>>>> if (code == PLUS && MEM_P (src2)) >>>>>> src2 = force_reg (mode, src2); >>>>>> >>>>>> Any opinions? >>>>>> >>>>> >>>>> It doesn't work with 64bit libstdc++: >>>> >>>> Yeah, yeah. ix86_output_mi_thunk has some ... issues. >>>> >>>> Please try attached patch that introduces ix86_emit_binop and uses it >>>> in a bunch of places. >> >>> I tried it on GCC. There are no regressions. The bugs are fixed for x32. >>> Here are size comparison with GCC runtime libraries on ia32, x32 and >>> x86-64: >> >>> 884093 18600 27064 929757 e2fdd old libstdc++.so >>> 884189 18600 27064 929853 e303d new libs/libstdc++.so >>> >>> The new code is >>> >>> mov 0xc(%edi),%eax >>> mov %eax,0x8(%esi) >>> mov -0xc(%eax),%eax >>> mov 0x10(%edi),%edx >>> lea 0x8(%esi,%eax,1),%eax >>> >>> The old one is >>> >>> mov 0xc(%edi),%edx >>> lea 0x8(%esi),%eax >>> mov %edx,0x8(%esi) >>> add -0xc(%edx),%eax >>> mov 0x10(%edi),%edx >> >> The new code merged lea+add into one lea, so it looks quite OK to me. >> >> Do you have some performance numbers? >> > > I will report performance numbers in a few days.
The differences in SPEC CPU 2006 on ia32, x86-64 and x32 are within noise range. -- H.J.