Re: [PATCH ARM]Refine scaled address expression on ARM

Yufeng Zhang Fri, 29 Nov 2013 06:50:27 -0800

On 11/29/13 12:02, Richard Biener wrote:

On Fri, Nov 29, 2013 at 12:46 PM, Yufeng Zhang<yufeng.zh...@arm.com>  wrote:

On 11/29/13 07:52, Bin.Cheng wrote:


On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng<amker.ch...@gmail.com>   wrote:


On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw<rearn...@arm.com>
wrote:


On 18/09/13 10:15, bin.cheng wrote:

-----Original Message-----
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of bin.cheng
Sent: Monday, September 02, 2013 3:09 PM
To: Richard Earnshaw
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH ARM]Refine scaled address expression on ARM

-----Original Message-----
From: Richard Earnshaw
Sent: Thursday, August 29, 2013 9:06 PM
To: Bin Cheng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

On 28/08/13 08:00, bin.cheng wrote:


Hi,

This patch refines scaled address expression on ARM.  It supports
"base+index*scale" in arm_legitimate_address_outer_p.  It also tries
to legitimize "base + index * scale + offset" with "reg<- base +
offset;  reg
+ index * scale" by introducing thumb2_legitimize_address.  For now
+ function
thumb2_legitimize_address is a kind of placeholder and just does the
mentioned transformation by calling to try_multiplier_address.
Hoping we can improve it in the future.

With this patch:
1) "base+index*scale" is recognized.



That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical
form.
   So this shouldn't be necessary.  Can you identify where this


non-canoncial form is being generated?


Oh, for now ivopt constructs "index*scale" to test whether backend
supports scaled addressing mode, which is not valid on ARM, so I was
going
to construct "base + index*scale" instead.  Since "base + index *
scale"


is not


canonical form, I will construct the canonical form and drop this part
of

the


patch.

Is rest of this patch OK?

Hi Richard, I removed the part over which you concerned and created
this
updated patch.

Is it OK?

Thanks.
bin

2013-09-18  Bin Cheng<bin.ch...@arm.com>

        * config/arm/arm.c (try_multiplier_address): New function.
        (thumb2_legitimize_address): New function.
        (arm_legitimize_address): Call try_multiplier_address and
        thumb2_legitimize_address.


6-arm-scaled_address-20130918.txt


Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c      (revision 200774)
+++ gcc/config/arm/arm.c      (working copy)
@@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
       }
   }

+/* Try to find address expression like base + index * scale + offset
+   in X.  If we find one, force base + offset into register and
+   construct new expression reg + index * scale; return the new
+   address expression if it's valid.  Otherwise return X.  */
+static rtx
+try_multiplier_address (rtx x, enum machine_mode mode
ATTRIBUTE_UNUSED)
+{
+  rtx tmp, base_reg, new_rtx;
+  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset =
NULL_RTX;
+
+  gcc_assert (GET_CODE (x) == PLUS);
+
+  /* Try to find and record base/index/scale/offset in X. */
+  if (GET_CODE (XEXP (x, 1)) == MULT)
+    {
+      tmp = XEXP (x, 0);
+      index = XEXP (XEXP (x, 1), 0);
+      scale = XEXP (XEXP (x, 1), 1);
+      if (GET_CODE (tmp) != PLUS)
+     return x;
+
+      base = XEXP (tmp, 0);
+      offset = XEXP (tmp, 1);
+    }
+  else
+    {
+      tmp = XEXP (x, 0);
+      offset = XEXP (x, 1);
+      if (GET_CODE (tmp) != PLUS)
+     return x;
+
+      base = XEXP (tmp, 0);
+      scale = XEXP (tmp, 1);
+      if (GET_CODE (base) == MULT)
+     {
+       tmp = base;
+       base = scale;
+       scale = tmp;
+     }
+      if (GET_CODE (scale) != MULT)
+     return x;
+
+      index = XEXP (scale, 0);
+      scale = XEXP (scale, 1);
+    }
+
+  if (CONST_INT_P (base))
+    {
+      tmp = base;
+      base = offset;
+      offset = tmp;
+    }
+
+  if (CONST_INT_P (index))
+    {
+      tmp = index;
+      index = scale;
+      scale = tmp;
+    }
+
+  /* ARM only supports constant scale in address.  */
+  if (!CONST_INT_P (scale))
+    return x;
+
+  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
+    return x;
+
+  /* Only register/constant are allowed in each part.  */
+  if (!symbol_mentioned_p (base)
+&&   !symbol_mentioned_p (offset)
+&&   !symbol_mentioned_p (index)
+&&   !symbol_mentioned_p (scale))
+    {



It would be easier to do this at the top of the function --
    if (symbol_mentioned_p (x))
      return x;

+      /* Force "base+offset" into register and construct
+      "register+index*scale".  Return the new expression
+      only if it's valid.  */
+      tmp = gen_rtx_PLUS (SImode, base, offset);
+      base_reg = force_reg (SImode, tmp);
+      tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
+      new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
+      return new_rtx;



I can't help thinking that this is backwards.  That is, you want to
split out the mult expression and use offset addressing in the addresses
itself.  That's likely to lead to either better CSE, or more induction


Thanks to your review.

Actually base+offset is more likely loop invariant than scaled
expression, just as reported in pr57540.  The bug is observed in
spec2k bzip/gzip, and hurts arm in hot loops.  The loop induction
variable doesn't matter that much comparing to invariant because we
are in RTL now.

vars.  Furthermore, scaled addressing modes are likely to be more
expensive than simple offset addressing modes on at least some cores.


The purpose is to CSE (as you pointed out) or hoist loop invariant.
As for addressing mode is concerned, Though we may guide the
transformation once we have reliable address cost mode, we don't have
the information if base+offset is invariant, so it's difficult to
handle in backend, right?

What do you think about this?


Additionally, there is no induction variable issue here.  The memory
reference we are facing during expand are not lowered by gimple IVOPT,
which means either it's outside loop, or doesn't have induction
variable address.

Generally, there are three passes which lowers memory reference:
gimple strength reduction picks opportunity outside loop; gimple IVOPT
handles references with induction variable addresses; references not
handled by SLSR/IVOPT are handled by RTL expand, which makes bad
choice of address re-association.  I think Yufeng also noticed the
problem and are trying with patch like:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html



Yes, when it comes to addressing expression, the re-association in RTL
expand is quite sensitive to the available address modes on the target and
its address legitimization routines.  Both patches I proposed try to make
the RTL expansion more canonicalized on addressing expressions, especially
on ARM.  It is done by essentially enabling simplify_gen_binary () to work
on a bigger RTX node.

After thinking twice, I some kind of think we should not re-associate
addresses during expanding, because of lacking of context information.
   Take base + scaled_index + offset as an example in PR57540, we just
don't know if "base+offset" is loop invariant from either backend or
RTL expander.



I'm getting less convinced by re-associating base with offset
unconditionally.  One counter example is

typedef int arr_1[20];
void foo (arr_1 a1, int i)
{
   a1[i+10] = 1;
}

I'm experimenting a patch to get the immediate offset in the above example
to be the last addend in the address computing (as mentioned in
http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the
following code-gen:

         add     r1, r0, r1, asl #2
         mov     r3, #1
         str     r3, [r1, #40]

With your patch applied, the effort will be reverted to

         add     r0, r0, #40
         mov     r3, #1
         str     r3, [r0, r1, asl #2]


I briefly looked into PR57540.  I noticed that you are trying to tackle a
loop invariant in a hot loop:

.L5:
         add     lr, sp, #2064            ////loop invariant
         add     r2, r2, #1
         add     r3, lr, r3, asl #2
         ldr     r3, [r3, #-2064]
         cmp     r3, #0
         bge     .L5
         uxtb    r2, r2


Why does RTL invariant motion not move it?

I haven't looked at the issue in detail. I think Bin will be able toanswer it.

The IM won't solve the entire problem, as ideally we'd like to see the'- 2064' happen earlier, so that we'll get


.L5:
          add     r3, lr, r3, asl #2
          ldr     r3, [sp, r3, asl #2]
          cmp     r3, #0
          bge     .L5
          uxtb    r2, r2

Thanks,
Yufeng

Re: [PATCH ARM]Refine scaled address expression on ARM

Reply via email to