Re: [PATCH ARM]Refine scaled address expression on ARM

Richard Biener Fri, 29 Nov 2013 04:03:42 -0800

On Fri, Nov 29, 2013 at 12:46 PM, Yufeng Zhang <yufeng.zh...@arm.com> wrote:
> On 11/29/13 07:52, Bin.Cheng wrote:
>>
>> On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng<amker.ch...@gmail.com>  wrote:
>>>
>>> On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw<rearn...@arm.com>
>>> wrote:
>>>>
>>>> On 18/09/13 10:15, bin.cheng wrote:
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>>>>>> ow...@gcc.gnu.org] On Behalf Of bin.cheng
>>>>>> Sent: Monday, September 02, 2013 3:09 PM
>>>>>> To: Richard Earnshaw
>>>>>> Cc: gcc-patches@gcc.gnu.org
>>>>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Richard Earnshaw
>>>>>>> Sent: Thursday, August 29, 2013 9:06 PM
>>>>>>> To: Bin Cheng
>>>>>>> Cc: gcc-patches@gcc.gnu.org
>>>>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>>
>>>>>>> On 28/08/13 08:00, bin.cheng wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This patch refines scaled address expression on ARM.  It supports
>>>>>>>> "base+index*scale" in arm_legitimate_address_outer_p.  It also tries
>>>>>>>> to legitimize "base + index * scale + offset" with "reg<- base +
>>>>>>>> offset;  reg
>>>>>>>> + index * scale" by introducing thumb2_legitimize_address.  For now
>>>>>>>> + function
>>>>>>>> thumb2_legitimize_address is a kind of placeholder and just does the
>>>>>>>> mentioned transformation by calling to try_multiplier_address.
>>>>>>>> Hoping we can improve it in the future.
>>>>>>>>
>>>>>>>> With this patch:
>>>>>>>> 1) "base+index*scale" is recognized.
>>>>>>>
>>>>>>>
>>>>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical
>>>>>>> form.
>>>>>>>   So this shouldn't be necessary.  Can you identify where this
>>>>>>
>>>>>> non-canoncial form is being generated?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Oh, for now ivopt constructs "index*scale" to test whether backend
>>>>>> supports scaled addressing mode, which is not valid on ARM, so I was
>>>>>> going
>>>>>> to construct "base + index*scale" instead.  Since "base + index *
>>>>>> scale"
>>>>>
>>>>> is not
>>>>>>
>>>>>> canonical form, I will construct the canonical form and drop this part
>>>>>> of
>>>>>
>>>>> the
>>>>>>
>>>>>> patch.
>>>>>>
>>>>>> Is rest of this patch OK?
>>>>>>
>>>>> Hi Richard, I removed the part over which you concerned and created
>>>>> this
>>>>> updated patch.
>>>>>
>>>>> Is it OK?
>>>>>
>>>>> Thanks.
>>>>> bin
>>>>>
>>>>> 2013-09-18  Bin Cheng<bin.ch...@arm.com>
>>>>>
>>>>>        * config/arm/arm.c (try_multiplier_address): New function.
>>>>>        (thumb2_legitimize_address): New function.
>>>>>        (arm_legitimize_address): Call try_multiplier_address and
>>>>>        thumb2_legitimize_address.
>>>>>
>>>>>
>>>>> 6-arm-scaled_address-20130918.txt
>>>>>
>>>>>
>>>>> Index: gcc/config/arm/arm.c
>>>>> ===================================================================
>>>>> --- gcc/config/arm/arm.c      (revision 200774)
>>>>> +++ gcc/config/arm/arm.c      (working copy)
>>>>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
>>>>>       }
>>>>>   }
>>>>>
>>>>> +/* Try to find address expression like base + index * scale + offset
>>>>> +   in X.  If we find one, force base + offset into register and
>>>>> +   construct new expression reg + index * scale; return the new
>>>>> +   address expression if it's valid.  Otherwise return X.  */
>>>>> +static rtx
>>>>> +try_multiplier_address (rtx x, enum machine_mode mode
>>>>> ATTRIBUTE_UNUSED)
>>>>> +{
>>>>> +  rtx tmp, base_reg, new_rtx;
>>>>> +  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset =
>>>>> NULL_RTX;
>>>>> +
>>>>> +  gcc_assert (GET_CODE (x) == PLUS);
>>>>> +
>>>>> +  /* Try to find and record base/index/scale/offset in X. */
>>>>> +  if (GET_CODE (XEXP (x, 1)) == MULT)
>>>>> +    {
>>>>> +      tmp = XEXP (x, 0);
>>>>> +      index = XEXP (XEXP (x, 1), 0);
>>>>> +      scale = XEXP (XEXP (x, 1), 1);
>>>>> +      if (GET_CODE (tmp) != PLUS)
>>>>> +     return x;
>>>>> +
>>>>> +      base = XEXP (tmp, 0);
>>>>> +      offset = XEXP (tmp, 1);
>>>>> +    }
>>>>> +  else
>>>>> +    {
>>>>> +      tmp = XEXP (x, 0);
>>>>> +      offset = XEXP (x, 1);
>>>>> +      if (GET_CODE (tmp) != PLUS)
>>>>> +     return x;
>>>>> +
>>>>> +      base = XEXP (tmp, 0);
>>>>> +      scale = XEXP (tmp, 1);
>>>>> +      if (GET_CODE (base) == MULT)
>>>>> +     {
>>>>> +       tmp = base;
>>>>> +       base = scale;
>>>>> +       scale = tmp;
>>>>> +     }
>>>>> +      if (GET_CODE (scale) != MULT)
>>>>> +     return x;
>>>>> +
>>>>> +      index = XEXP (scale, 0);
>>>>> +      scale = XEXP (scale, 1);
>>>>> +    }
>>>>> +
>>>>> +  if (CONST_INT_P (base))
>>>>> +    {
>>>>> +      tmp = base;
>>>>> +      base = offset;
>>>>> +      offset = tmp;
>>>>> +    }
>>>>> +
>>>>> +  if (CONST_INT_P (index))
>>>>> +    {
>>>>> +      tmp = index;
>>>>> +      index = scale;
>>>>> +      scale = tmp;
>>>>> +    }
>>>>> +
>>>>> +  /* ARM only supports constant scale in address.  */
>>>>> +  if (!CONST_INT_P (scale))
>>>>> +    return x;
>>>>> +
>>>>> +  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
>>>>> +    return x;
>>>>> +
>>>>> +  /* Only register/constant are allowed in each part.  */
>>>>> +  if (!symbol_mentioned_p (base)
>>>>> +&&  !symbol_mentioned_p (offset)
>>>>> +&&  !symbol_mentioned_p (index)
>>>>> +&&  !symbol_mentioned_p (scale))
>>>>> +    {
>>>>
>>>>
>>>> It would be easier to do this at the top of the function --
>>>>    if (symbol_mentioned_p (x))
>>>>      return x;
>>>>
>>>>
>>>>> +      /* Force "base+offset" into register and construct
>>>>> +      "register+index*scale".  Return the new expression
>>>>> +      only if it's valid.  */
>>>>> +      tmp = gen_rtx_PLUS (SImode, base, offset);
>>>>> +      base_reg = force_reg (SImode, tmp);
>>>>> +      tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
>>>>> +      new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
>>>>> +      return new_rtx;
>>>>
>>>>
>>>> I can't help thinking that this is backwards.  That is, you want to
>>>> split out the mult expression and use offset addressing in the addresses
>>>> itself.  That's likely to lead to either better CSE, or more induction
>>>
>>> Thanks to your review.
>>>
>>> Actually base+offset is more likely loop invariant than scaled
>>> expression, just as reported in pr57540.  The bug is observed in
>>> spec2k bzip/gzip, and hurts arm in hot loops.  The loop induction
>>> variable doesn't matter that much comparing to invariant because we
>>> are in RTL now.
>>>
>>>> vars.  Furthermore, scaled addressing modes are likely to be more
>>>> expensive than simple offset addressing modes on at least some cores.
>>>
>>> The purpose is to CSE (as you pointed out) or hoist loop invariant.
>>> As for addressing mode is concerned, Though we may guide the
>>> transformation once we have reliable address cost mode, we don't have
>>> the information if base+offset is invariant, so it's difficult to
>>> handle in backend, right?
>>>
>>> What do you think about this?
>>>
>>
>> Additionally, there is no induction variable issue here.  The memory
>> reference we are facing during expand are not lowered by gimple IVOPT,
>> which means either it's outside loop, or doesn't have induction
>> variable address.
>>
>> Generally, there are three passes which lowers memory reference:
>> gimple strength reduction picks opportunity outside loop; gimple IVOPT
>> handles references with induction variable addresses; references not
>> handled by SLSR/IVOPT are handled by RTL expand, which makes bad
>> choice of address re-association.  I think Yufeng also noticed the
>> problem and are trying with patch like:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html
>
>
> Yes, when it comes to addressing expression, the re-association in RTL
> expand is quite sensitive to the available address modes on the target and
> its address legitimization routines.  Both patches I proposed try to make
> the RTL expansion more canonicalized on addressing expressions, especially
> on ARM.  It is done by essentially enabling simplify_gen_binary () to work
> on a bigger RTX node.
>
>
>> After thinking twice, I some kind of think we should not re-associate
>> addresses during expanding, because of lacking of context information.
>>   Take base + scaled_index + offset as an example in PR57540, we just
>> don't know if "base+offset" is loop invariant from either backend or
>> RTL expander.
>
>
> I'm getting less convinced by re-associating base with offset
> unconditionally.  One counter example is
>
> typedef int arr_1[20];
> void foo (arr_1 a1, int i)
> {
>   a1[i+10] = 1;
> }
>
> I'm experimenting a patch to get the immediate offset in the above example
> to be the last addend in the address computing (as mentioned in
> http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the
> following code-gen:
>
>         add     r1, r0, r1, asl #2
>         mov     r3, #1
>         str     r3, [r1, #40]
>
> With your patch applied, the effort will be reverted to
>
>         add     r0, r0, #40
>         mov     r3, #1
>         str     r3, [r0, r1, asl #2]
>
>
> I briefly looked into PR57540.  I noticed that you are trying to tackle a
> loop invariant in a hot loop:
>
> .L5:
>         add     lr, sp, #2064            ////loop invariant
>         add     r2, r2, #1
>         add     r3, lr, r3, asl #2
>         ldr     r3, [r3, #-2064]
>         cmp     r3, #0
>         bge     .L5
>         uxtb    r2, r2


Why does RTL invariant motion not move it?

Richard.

> Looking into the example code:
>
> void
> foo ()
> {
>   int parent [ 258 * 2 ];
>   for (i = 1; i <= alphaSize; i++)
>     {
>       while (parent[k] >= 0)
>         {
>           k = parent[k];
>           ...
>         }
>       ...
>
> The loop invariant is part of address computing for a stack variable. The
> immediate 2064 is the offset of the variable on the stack frame rather than
> what we normally expect, e.g. part of indexing; it is a back-end specific
> issue and there is nothing the mid-end can do (the mem_ref lowering cannot
> help either).  One possible solution may be to force the base address of an
> array on stack to a REG, before the RTL expand does anything 'smart'.  Is it
> something you think worth giving a try?
>
> Just my two cents.
>
> Thanks,
> Yufeng
>

Re: [PATCH ARM]Refine scaled address expression on ARM

Reply via email to