On Fri, Mar 6, 2015 at 11:31 AM, Thomas Preud'homme <thomas.preudho...@arm.com> wrote: > Hi, > > Improved canonization after r216728 causes GCC to more often generate poor > code due to suboptimal ordering of operand of commutative libcalls. Indeed, > if one of the operands of a commutative operation is the result of a previous > operation, both being implemented by libcall, the wrong ordering of the > operands in the second operation can lead to extra mov. Consider the > following case on softfloat targets: > > double > test1 (double x, double y) > { > return x * (x + y); > } > > If x + y is put in the operand using the same register as the result of the > libcall for x + y then no mov is generated, otherwise mov is needed. The > following happens on arm softfloat with the right ordering: > > bl __aeabi_dadd > ldr r2, [sp] > ldr r3, [sp, #4] > /* r0, r1 are reused from the return values of the __aeabi_dadd libcall. */ > bl __aeabi_dmul > > With the wrong ordering one gets: > > bl __aeabi_dadd > mov r2, r0 > mov r3, r1 > ldr r0, [sp] > ldr r1, [sp, #4] > bl __aeabi_dmul > > This patch extend the patch written by Yuri Rumyantsev in r219646 to also > deal with the case of only one of the operand being the result of an > operation. ChangeLog entries are as follows: > > *** gcc/ChangeLog *** > > 2015-03-05 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/63743 > * cfgexpand.c (reorder_operands): Also reorder if only second operand > had its definition forwarded by TER. > > *** gcc/testsuite/ChangeLog *** > > 2015-03-05 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/63743 > * gcc.dg/pr63743.c: New test. > > > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c > index 7dfe1f6..4fbc037 100644 > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb) > continue; > /* Swap operands if the second one is more expensive. */ > def0 = get_gimple_for_ssa_name (op0); > - if (!def0) > - continue; > def1 = get_gimple_for_ssa_name (op1); > if (!def1) > continue; > swap = false; > - if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)]) > + if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)]) > swap = true; > if (swap) > { > @@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb) > fprintf (dump_file, "Swap operands in stmt:\n"); > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > fprintf (dump_file, "Cost left opnd=%d, right opnd=%d\n", > - lattice[gimple_uid (def0)], > + def0 ? lattice[gimple_uid (def0)] : 0, > lattice[gimple_uid (def1)]); > } > swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt), > diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr63743.c > new file mode 100644 > index 0000000..87254ed > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr63743.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fdump-rtl-expand-details" } */ > + > +double > +libcall_dep (double x, double y) > +{ > + return x * (x + y); > +} > + > +/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */ > +/* { dg-final { cleanup-rtl-dump "expand" } } */ > > > Testsuite was run in QEMU when compiled by an arm-none-eabi GCC > cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC > without any regression. > CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% code > size increase on x86_64-linux-gnu. > > Is it ok for trunk (since it fixes a code size regression in 5.0)?
Ok. Thanks, Richard. > Best regards, > > Thomas > > >