On Fri, Mar 6, 2015 at 11:31 AM, Thomas Preud'homme
<thomas.preudho...@arm.com> wrote:
> Hi,
>
> Improved canonization after r216728 causes GCC to more often generate poor 
> code due to suboptimal ordering of operand of commutative libcalls. Indeed, 
> if one of the operands of a commutative operation is the result of a previous 
> operation, both being implemented by libcall, the wrong ordering of the 
> operands in the second operation can lead to extra mov. Consider the 
> following case on softfloat targets:
>
> double
> test1 (double x, double y)
> {
>   return x * (x + y);
> }
>
> If x + y is put in the operand using the same register as the result of the 
> libcall for x + y then no mov is generated, otherwise mov is needed. The 
> following happens on arm softfloat with the right ordering:
>
> bl      __aeabi_dadd
> ldr     r2, [sp]
> ldr     r3, [sp, #4]
> /* r0, r1 are reused from the return values of the __aeabi_dadd libcall. */
> bl      __aeabi_dmul
>
> With the wrong ordering one gets:
>
> bl      __aeabi_dadd
> mov     r2, r0
> mov     r3, r1
> ldr     r0, [sp]
> ldr     r1, [sp, #4]
> bl      __aeabi_dmul
>
> This patch extend the patch written by Yuri Rumyantsev in r219646 to also 
> deal with the case of only one of the operand being the result of an 
> operation. ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2015-03-05  Thomas Preud'homme  <thomas.preudho...@arm.com>
>
>         PR tree-optimization/63743
>         * cfgexpand.c (reorder_operands): Also reorder if only second operand
>         had its definition forwarded by TER.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2015-03-05  Thomas Preud'homme  <thomas.preudho...@arm.com>
>
>         PR tree-optimization/63743
>         * gcc.dg/pr63743.c: New test.
>
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 7dfe1f6..4fbc037 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb)
>         continue;
>        /* Swap operands if the second one is more expensive.  */
>        def0 = get_gimple_for_ssa_name (op0);
> -      if (!def0)
> -       continue;
>        def1 = get_gimple_for_ssa_name (op1);
>        if (!def1)
>         continue;
>        swap = false;
> -      if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
> +      if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
>         swap = true;
>        if (swap)
>         {
> @@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb)
>               fprintf (dump_file, "Swap operands in stmt:\n");
>               print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>               fprintf (dump_file, "Cost left opnd=%d, right opnd=%d\n",
> -                      lattice[gimple_uid (def0)],
> +                      def0 ? lattice[gimple_uid (def0)] : 0,
>                        lattice[gimple_uid (def1)]);
>             }
>           swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
> diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr63743.c
> new file mode 100644
> index 0000000..87254ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr63743.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-rtl-expand-details" } */
> +
> +double
> +libcall_dep (double x, double y)
> +{
> +  return x * (x + y);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
>
>
> Testsuite was run in QEMU when compiled by an arm-none-eabi GCC 
> cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC 
> without any regression.
> CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% code 
> size increase on x86_64-linux-gnu.
>
> Is it ok for trunk (since it fixes a code size regression in 5.0)?

Ok.

Thanks,
Richard.

> Best regards,
>
> Thomas
>
>
>

Reply via email to