Hi,
There is a performance regression caused by my previous change to 
aarch64_legitimize_address, in which I forced constant offset out of memory ref 
if the address expr is in the form of "reg1 + reg2 << scale + const".  The 
intention is to reveal as many loop-invariant opportunities as possible, while 
depend on GIMPLE optimizers picking up CSE opportunities of "reg << scale" 
among different memory references.

Though the assumption still holds, gimple optimizers are not powerful enough to 
pick up CSE opportunities of register scaling expressions at current time.  
Here comes a workaround: this patch forces register scaling expression out of 
memory ref, so that RTL CSE pass can handle common register scaling expressions 
issue, of course, at a cost of possibly missed loop invariants.

James and I collected perf data, fortunately this change can improve 
performance for several cases in various benchmarks, while doesn't cause big 
regression.  It also recovers big regression we observed before for the 
previous change.

I also added comment explaining why the workaround is necessary.  I also files 
PR69653 as an example showing tree optimizer should be improved.

Bootstrap and test on AArch64, is it OK?

Thanks,
bin


2016-02-04  Bin Cheng  <bin.ch...@arm.com>

        * config/aarch64/aarch64.c (aarch64_legitimize_address): Force
        register scaling out of memory reference and comment why.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 03bc1b9..24a7d5b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4926,13 +4926,18 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
               Rt = Ra + Rc;
               addr = Rt + Rb<<SCALE.
 
-            Here we split CONST out of memory referece because:
+            TODO: We really should split CONST out of memory referece
+            because:
               a) We depend on GIMPLE optimizers to pick up common sub
                  expression involving the scaling operation.
               b) The index Rb is likely a loop iv, it's better to split
                  the CONST so that computation of new base Rt is a loop
                  invariant and can be moved out of loop.  This is more
-                 important when the original base Ra is sfp related.  */
+                 important when the original base Ra is sfp related.
+
+            Unfortunately, GIMPLE optimizers (e.g., SLSR) can not handle
+            this kind of CSE opportunity at the time of this change, we
+            have to force register scaling expr out of memory ref now.  */
          else if (REG_P (op0) || REG_P (op1))
            {
              machine_mode addr_mode = GET_MODE (x);
@@ -4942,14 +4947,13 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
              if (REG_P (op1))
                std::swap (op0, op1);
 
-             rtx addr = gen_rtx_PLUS (addr_mode, op1, base);
+             rtx addr = plus_constant (addr_mode, base, offset);
 
              if (aarch64_legitimate_address_hook_p (mode, addr, false))
                {
-                 base = force_operand (plus_constant (addr_mode,
-                                                      op0, offset),
+                 base = force_operand (gen_rtx_PLUS (addr_mode, op1, op0),
                                        NULL_RTX);
-                 return gen_rtx_PLUS (addr_mode, op1, base);
+                 return plus_constant (addr_mode, base, offset);
                }
            }
        }

Reply via email to