Re: [PATCH ARM]Refine scaled address expression on ARM

Yufeng Zhang Fri, 29 Nov 2013 03:48:04 -0800

On 11/29/13 07:52, Bin.Cheng wrote:

On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng<amker.ch...@gmail.com>  wrote:

On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw<rearn...@arm.com>  wrote:

On 18/09/13 10:15, bin.cheng wrote:

-----Original Message-----
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of bin.cheng
Sent: Monday, September 02, 2013 3:09 PM
To: Richard Earnshaw
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH ARM]Refine scaled address expression on ARM

-----Original Message-----
From: Richard Earnshaw
Sent: Thursday, August 29, 2013 9:06 PM
To: Bin Cheng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH ARM]Refine scaled address expression on ARM

On 28/08/13 08:00, bin.cheng wrote:

Hi,

This patch refines scaled address expression on ARM.  It supports
"base+index*scale" in arm_legitimate_address_outer_p.  It also tries
to legitimize "base + index * scale + offset" with "reg<- base +
offset;  reg
+ index * scale" by introducing thumb2_legitimize_address.  For now
+ function
thumb2_legitimize_address is a kind of placeholder and just does the
mentioned transformation by calling to try_multiplier_address.
Hoping we can improve it in the future.

With this patch:
1) "base+index*scale" is recognized.


That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
  So this shouldn't be necessary.  Can you identify where this

non-canoncial form is being generated?


Oh, for now ivopt constructs "index*scale" to test whether backend
supports scaled addressing mode, which is not valid on ARM, so I was going
to construct "base + index*scale" instead.  Since "base + index * scale"

is not

canonical form, I will construct the canonical form and drop this part of

the

patch.

Is rest of this patch OK?

Hi Richard, I removed the part over which you concerned and created this
updated patch.

Is it OK?

Thanks.
bin

2013-09-18  Bin Cheng<bin.ch...@arm.com>

       * config/arm/arm.c (try_multiplier_address): New function.
       (thumb2_legitimize_address): New function.
       (arm_legitimize_address): Call try_multiplier_address and
       thumb2_legitimize_address.


6-arm-scaled_address-20130918.txt


Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c      (revision 200774)
+++ gcc/config/arm/arm.c      (working copy)
@@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
      }
  }

+/* Try to find address expression like base + index * scale + offset
+   in X.  If we find one, force base + offset into register and
+   construct new expression reg + index * scale; return the new
+   address expression if it's valid.  Otherwise return X.  */
+static rtx
+try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+  rtx tmp, base_reg, new_rtx;
+  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX;
+
+  gcc_assert (GET_CODE (x) == PLUS);
+
+  /* Try to find and record base/index/scale/offset in X. */
+  if (GET_CODE (XEXP (x, 1)) == MULT)
+    {
+      tmp = XEXP (x, 0);
+      index = XEXP (XEXP (x, 1), 0);
+      scale = XEXP (XEXP (x, 1), 1);
+      if (GET_CODE (tmp) != PLUS)
+     return x;
+
+      base = XEXP (tmp, 0);
+      offset = XEXP (tmp, 1);
+    }
+  else
+    {
+      tmp = XEXP (x, 0);
+      offset = XEXP (x, 1);
+      if (GET_CODE (tmp) != PLUS)
+     return x;
+
+      base = XEXP (tmp, 0);
+      scale = XEXP (tmp, 1);
+      if (GET_CODE (base) == MULT)
+     {
+       tmp = base;
+       base = scale;
+       scale = tmp;
+     }
+      if (GET_CODE (scale) != MULT)
+     return x;
+
+      index = XEXP (scale, 0);
+      scale = XEXP (scale, 1);
+    }
+
+  if (CONST_INT_P (base))
+    {
+      tmp = base;
+      base = offset;
+      offset = tmp;
+    }
+
+  if (CONST_INT_P (index))
+    {
+      tmp = index;
+      index = scale;
+      scale = tmp;
+    }
+
+  /* ARM only supports constant scale in address.  */
+  if (!CONST_INT_P (scale))
+    return x;
+
+  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
+    return x;
+
+  /* Only register/constant are allowed in each part.  */
+  if (!symbol_mentioned_p (base)
+&&  !symbol_mentioned_p (offset)
+&&  !symbol_mentioned_p (index)
+&&  !symbol_mentioned_p (scale))
+    {


It would be easier to do this at the top of the function --
   if (symbol_mentioned_p (x))
     return x;

+      /* Force "base+offset" into register and construct
+      "register+index*scale".  Return the new expression
+      only if it's valid.  */
+      tmp = gen_rtx_PLUS (SImode, base, offset);
+      base_reg = force_reg (SImode, tmp);
+      tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
+      new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
+      return new_rtx;


I can't help thinking that this is backwards.  That is, you want to
split out the mult expression and use offset addressing in the addresses
itself.  That's likely to lead to either better CSE, or more induction

Thanks to your review.

Actually base+offset is more likely loop invariant than scaled
expression, just as reported in pr57540.  The bug is observed in
spec2k bzip/gzip, and hurts arm in hot loops.  The loop induction
variable doesn't matter that much comparing to invariant because we
are in RTL now.

vars.  Furthermore, scaled addressing modes are likely to be more
expensive than simple offset addressing modes on at least some cores.

The purpose is to CSE (as you pointed out) or hoist loop invariant.
As for addressing mode is concerned, Though we may guide the
transformation once we have reliable address cost mode, we don't have
the information if base+offset is invariant, so it's difficult to
handle in backend, right?

What do you think about this?


Additionally, there is no induction variable issue here.  The memory
reference we are facing during expand are not lowered by gimple IVOPT,
which means either it's outside loop, or doesn't have induction
variable address.

Generally, there are three passes which lowers memory reference:
gimple strength reduction picks opportunity outside loop; gimple IVOPT
handles references with induction variable addresses; references not
handled by SLSR/IVOPT are handled by RTL expand, which makes bad
choice of address re-association.  I think Yufeng also noticed the
problem and are trying with patch like:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html

Yes, when it comes to addressing expression, the re-association in RTLexpand is quite sensitive to the available address modes on the targetand its address legitimization routines. Both patches I proposed try tomake the RTL expansion more canonicalized on addressing expressions,especially on ARM. It is done by essentially enablingsimplify_gen_binary () to work on a bigger RTX node.

After thinking twice, I some kind of think we should not re-associate
addresses during expanding, because of lacking of context information.
  Take base + scaled_index + offset as an example in PR57540, we just
don't know if "base+offset" is loop invariant from either backend or
RTL expander.

I'm getting less convinced by re-associating base with offsetunconditionally. One counter example is


typedef int arr_1[20];
void foo (arr_1 a1, int i)
{
  a1[i+10] = 1;
}

I'm experimenting a patch to get the immediate offset in the aboveexample to be the last addend in the address computing (as mentioned inhttp://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get thefollowing code-gen:


        add     r1, r0, r1, asl #2
        mov     r3, #1
        str     r3, [r1, #40]

With your patch applied, the effort will be reverted to

        add     r0, r0, #40
        mov     r3, #1
        str     r3, [r0, r1, asl #2]

I briefly looked into PR57540. I noticed that you are trying to tacklea loop invariant in a hot loop:


.L5:
        add     lr, sp, #2064            ////loop invariant
        add     r2, r2, #1
        add     r3, lr, r3, asl #2
        ldr     r3, [r3, #-2064]
        cmp     r3, #0
        bge     .L5
        uxtb    r2, r2

Looking into the example code:

void
foo ()
{
  int parent [ 258 * 2 ];
  for (i = 1; i <= alphaSize; i++)
    {
      while (parent[k] >= 0)
        {
          k = parent[k];
          ...
        }
      ...

The loop invariant is part of address computing for a stack variable.The immediate 2064 is the offset of the variable on the stack framerather than what we normally expect, e.g. part of indexing; it is aback-end specific issue and there is nothing the mid-end can do (themem_ref lowering cannot help either). One possible solution may be toforce the base address of an array on stack to a REG, before the RTLexpand does anything 'smart'. Is it something you think worth giving a try?


Just my two cents.

Thanks,
Yufeng

Re: [PATCH ARM]Refine scaled address expression on ARM

Reply via email to