On 11/29/13 10:44, Richard Biener wrote:
On Fri, Nov 29, 2013 at 8:52 AM, Bin.Cheng<amker.ch...@gmail.com>  wrote:
On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng<amker.ch...@gmail.com>  wrote:
On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw<rearn...@arm.com>  wrote:
On 18/09/13 10:15, bin.cheng wrote:


-----Original Message-----
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of bin.cheng
Sent: Monday, September 02, 2013 3:09 PM
To: Richard Earnshaw
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH ARM]Refine scaled address expression on ARM



-----Original Message-----
From: Richard Earnshaw
Sent: Thursday, August 29, 2013 9:06 PM
To: Bin Cheng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

On 28/08/13 08:00, bin.cheng wrote:
Hi,

This patch refines scaled address expression on ARM.  It supports
"base+index*scale" in arm_legitimate_address_outer_p.  It also tries
to legitimize "base + index * scale + offset" with "reg<- base +
offset;  reg
+ index * scale" by introducing thumb2_legitimize_address.  For now
+ function
thumb2_legitimize_address is a kind of placeholder and just does the
mentioned transformation by calling to try_multiplier_address.
Hoping we can improve it in the future.

With this patch:
1) "base+index*scale" is recognized.

That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
  So this shouldn't be necessary.  Can you identify where this
non-canoncial form is being generated?


Oh, for now ivopt constructs "index*scale" to test whether backend
supports scaled addressing mode, which is not valid on ARM, so I was going
to construct "base + index*scale" instead.  Since "base + index * scale"
is not
canonical form, I will construct the canonical form and drop this part of
the
patch.

Is rest of this patch OK?

Hi Richard, I removed the part over which you concerned and created this
updated patch.

Is it OK?

Thanks.
bin

2013-09-18  Bin Cheng<bin.ch...@arm.com>

       * config/arm/arm.c (try_multiplier_address): New function.
       (thumb2_legitimize_address): New function.
       (arm_legitimize_address): Call try_multiplier_address and
       thumb2_legitimize_address.


6-arm-scaled_address-20130918.txt


Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c      (revision 200774)
+++ gcc/config/arm/arm.c      (working copy)
@@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
      }
  }

+/* Try to find address expression like base + index * scale + offset
+   in X.  If we find one, force base + offset into register and
+   construct new expression reg + index * scale; return the new
+   address expression if it's valid.  Otherwise return X.  */
+static rtx
+try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  rtx tmp, base_reg, new_rtx;
+  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX;
+
+  gcc_assert (GET_CODE (x) == PLUS);
+
+  /* Try to find and record base/index/scale/offset in X. */
+  if (GET_CODE (XEXP (x, 1)) == MULT)
+    {
+      tmp = XEXP (x, 0);
+      index = XEXP (XEXP (x, 1), 0);
+      scale = XEXP (XEXP (x, 1), 1);
+      if (GET_CODE (tmp) != PLUS)
+     return x;
+
+      base = XEXP (tmp, 0);
+      offset = XEXP (tmp, 1);
+    }
+  else
+    {
+      tmp = XEXP (x, 0);
+      offset = XEXP (x, 1);
+      if (GET_CODE (tmp) != PLUS)
+     return x;
+
+      base = XEXP (tmp, 0);
+      scale = XEXP (tmp, 1);
+      if (GET_CODE (base) == MULT)
+     {
+       tmp = base;
+       base = scale;
+       scale = tmp;
+     }
+      if (GET_CODE (scale) != MULT)
+     return x;
+
+      index = XEXP (scale, 0);
+      scale = XEXP (scale, 1);
+    }
+
+  if (CONST_INT_P (base))
+    {
+      tmp = base;
+      base = offset;
+      offset = tmp;
+    }
+
+  if (CONST_INT_P (index))
+    {
+      tmp = index;
+      index = scale;
+      scale = tmp;
+    }
+
+  /* ARM only supports constant scale in address.  */
+  if (!CONST_INT_P (scale))
+    return x;
+
+  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
+    return x;
+
+  /* Only register/constant are allowed in each part.  */
+  if (!symbol_mentioned_p (base)
+&&  !symbol_mentioned_p (offset)
+&&  !symbol_mentioned_p (index)
+&&  !symbol_mentioned_p (scale))
+    {

It would be easier to do this at the top of the function --
   if (symbol_mentioned_p (x))
     return x;


+      /* Force "base+offset" into register and construct
+      "register+index*scale".  Return the new expression
+      only if it's valid.  */
+      tmp = gen_rtx_PLUS (SImode, base, offset);
+      base_reg = force_reg (SImode, tmp);
+      tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
+      new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
+      return new_rtx;

I can't help thinking that this is backwards.  That is, you want to
split out the mult expression and use offset addressing in the addresses
itself.  That's likely to lead to either better CSE, or more induction
Thanks to your review.

Actually base+offset is more likely loop invariant than scaled
expression, just as reported in pr57540.  The bug is observed in
spec2k bzip/gzip, and hurts arm in hot loops.  The loop induction
variable doesn't matter that much comparing to invariant because we
are in RTL now.

vars.  Furthermore, scaled addressing modes are likely to be more
expensive than simple offset addressing modes on at least some cores.
The purpose is to CSE (as you pointed out) or hoist loop invariant.
As for addressing mode is concerned, Though we may guide the
transformation once we have reliable address cost mode, we don't have
the information if base+offset is invariant, so it's difficult to
handle in backend, right?

What do you think about this?


Additionally, there is no induction variable issue here.  The memory
reference we are facing during expand are not lowered by gimple IVOPT,
which means either it's outside loop, or doesn't have induction
variable address.

Generally, there are three passes which lowers memory reference:
gimple strength reduction picks opportunity outside loop; gimple IVOPT
handles references with induction variable addresses; references not
handled by SLSR/IVOPT are handled by RTL expand, which makes bad
choice of address re-association.  I think Yufeng also noticed the
problem and are trying with patch like:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html

After thinking twice, I some kind of think we should not re-associate
addresses during expanding, because of lacking of context information.
  Take base + scaled_index + offset as an example in PR57540, we just
don't know if "base+offset" is loop invariant from either backend or
RTL expander.

Any comments?

The issue is that re-association and address lowering is not integrated
with optimizers such as CSE and invariant motion.  This means it's
hard to arrive at optimal results and all passes are just "guessing"
and applying heuristics that the programmer thought are appropriate
for his machine.

I have no easy way out but think that we lower addresses too late
(at least fully).  Loop optimizers would like to see the complex
memory reference forms but starting with IVOPTs lowering all
addresses would likely be a win in the end (there are CSE, LIM
and re-association passes around after/at this point as well).

Back in the 4.3 days when I for the first time attempted to introduce
MEM_REF I forcefully lowered all memory accesses and addresses
very early (during gimplification basically).  That didn't work out well.

So my suggestion to anyone who wants to try something is
to apply an additional lowering to the whole GIMPLE, lowering
things like handled-component references and addresses
(and also things like bitfield accesses).

This idea of additional lowering is interesting. I found this wiki page describing the memory reference flattening in GIMPLE:

  http://gcc.gnu.org/wiki/MemRef

I wonder whether there is more information or notes you can share, things like common pitfalls, implication in alias analysis, etc. I'd like to do some experiment with it when I have time.

Regards,
Yufeng

Reply via email to