On 20/11/15 08:31, Bin.Cheng wrote:
> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>> <james.greenha...@arm.com> wrote:
>>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>>>> Hi,
>>>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>>>> expressions like below:
>>>>    FORM1: "r73 + r74 + 16380"
>>>>    FORM2: "r73 << 2 + r74 + 16380"
>>>>
>>>> They are invalid address expression on AArch64, so will be legitimized by
>>>> aarch64_legitimize_address.  Below are what we got from that function:
>>>>
>>>> For FORM1, the address expression is legitimized into below insn sequence
>>>> and rtx:
>>>>    r84:DI=r73:DI+r74:DI
>>>>    r85:DI=r84:DI+0x3000
>>>>    r83:DI=r85:DI
>>>>    "r83 + 4092"
>>>>
>>>> For FORM2, the address expression is legitimized into below insn sequence
>>>> and rtx:
>>>>    r108:DI=r73:DI<<0x2
>>>>    r109:DI=r108:DI+r74:DI
>>>>    r110:DI=r109:DI+0x3000
>>>>    r107:DI=r110:DI
>>>>    "r107 + 4092"
>>>>
>>>> So the costs computed are 12/16 respectively.  The high cost prevents IVO
>>>> from choosing right candidates.  Besides cost computation, I also think the
>>>> legitmization is bad in terms of code generation.
>>>> The root cause in aarch64_legitimize_address can be described by it's
>>>> comment:
>>>>    /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>>>>       where mask is selected by alignment and size of the offset.
>>>>       We try to pick as large a range for the offset as possible to
>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>       we limit the range to 4k so that structures with different sized
>>>>       elements are likely to use the same base.  */
>>>> I think the split of CONST is intended for REG+CONST where the const offset
>>>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
>>>> when the CONST are in the range of addressing modes.  As a result, these 
>>>> two
>>>> cases fallthrough this logic, resulting in sub-optimal results.
>>>>
>>>> It's obvious we can do below legitimization:
>>>> FORM1:
>>>>    r83:DI=r73:DI+r74:DI
>>>>    "r83 + 16380"
>>>> FORM2:
>>>>    r107:DI=0x3ffc
>>>>    r106:DI=r74:DI+r107:DI
>>>>       REG_EQUAL r74:DI+0x3ffc
>>>>    "r106 + r73 << 2"
>>>>
>>>> This patch handles these two cases as described.
>>>
>>> Thanks for the description, it made the patch very easy to review. I only
>>> have a style comment.
>>>
>>>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>>>
>>>> 2015-11-04  Bin Cheng  <bin.ch...@arm.com>
>>>>           Jiong Wang  <jiong.w...@arm.com>
>>>>
>>>>       * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>>>       address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index 5c8604f..47875ac 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>>>> */, machine_mode mode)
>>>>      {
>>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>        HOST_WIDE_INT base_offset;
>>>> +      rtx op0 = XEXP (x,0);
>>>> +
>>>> +      if (GET_CODE (op0) == PLUS)
>>>> +     {
>>>> +       rtx op0_ = XEXP (op0, 0);
>>>> +       rtx op1_ = XEXP (op0, 1);
>>>
>>> I don't see this trailing _ on a variable name in many places in the source
>>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
>>> Can we pick a different name for op0_ and op1_?
>>>
>>>> +
>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>>>> +          reach here, the 'CONST' may be valid in which case we should
>>>> +          not split.  */
>>>> +       if (REG_P (op0_) && REG_P (op1_))
>>>> +         {
>>>> +           machine_mode addr_mode = GET_MODE (op0);
>>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>>> +
>>>> +           rtx ret = plus_constant (addr_mode, addr, offset);
>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>> +             {
>>>> +               emit_insn (gen_adddi3 (addr, op0_, op1_));
>>>> +               return ret;
>>>> +             }
>>>> +         }
>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>>> +       else if (REG_P (op0_) || REG_P (op1_))
>>>> +         {
>>>> +           machine_mode addr_mode = GET_MODE (op0);
>>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>>> +
>>>> +           /* Switch to make sure that register is in op0_.  */
>>>> +           if (REG_P (op1_))
>>>> +             std::swap (op0_, op1_);
>>>> +
>>>> +           rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>> +             {
>>>> +               addr = force_operand (plus_constant (addr_mode,
>>>> +                                                    op0_, offset),
>>>> +                                     NULL_RTX);
>>>> +               ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>> +               return ret;
>>>> +             }
>>>
>>> The logic here is a bit hairy to follow, you construct a PLUS RTX to check
>>> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
>>> to use as the return value. This can probably be clarified by choosing a
>>> name other than ret for the temporary address expression you construct.
>>>
>>> It would also be good to take some of your detailed description and write
>>> that here. Certainly I found the explicit examples in the cover letter
>>> easier to follow than:
>>>
>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>>
>>> Otherwise this patch is OK.
>> Thanks for reviewing, here is the updated patch.
> 
> Hmm, I retested the patch on aarch64 and found it caused two
> additional failures.
> 
> FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9]
> This is caused by different ivopt decision because of this patch's
> cost change.  As far as IVO can tell, the new decision is better than
> the old one.  So is the IVOPTed dump.  I can fix this by changing how
> this patch legitimize address "r1 + r2 + offset".  In this patch, it's
> legitimized into "r3 = r1 + r2; [r3 + offset]"; we could change it
> into "r3 = offset; r4 = r1 + r3; [r4 + r2]".  This new form is better
> because possibly r4 is a loop invariant, but the cost is higher.  I
> tend to keep this patch the way it is since I don't know how the
> changed cost affects performance data.  We may need to live with this
> failure for a while.
> 
> FAIL: gcc.dg/atomic/stdatomic-vm.c   -O1  (internal compiler error)
> This I think is a potential bug in aarch64 backend.  GCC could
> generate "[r1 + r2 << 3] = unspec..." with this patch, for this test,
> LRA needs to make a reload for the address expression by doing
> "r1+r2<<3" outside of memory reference.  In function emit_add3_insn,
> it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3
> pattern.  The code is as below:
> 
>   if (have_addptr3_insn (x, y, z))
>     {
>       rtx_insn *insn = gen_addptr3_insn (x, y, z);
> 
>       /* If the target provides an "addptr" pattern it hopefully does
>      for a reason.  So falling back to the normal add would be
>      a bug.  */
>       lra_assert (insn != NULL_RTX);
>       emit_insn (insn);
>       return insn;
>     }
> 
>   rtx_insn* insn = emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE
> (y), y, z)));
>   if (recog_memoized (insn) < 0)
>     {
>       delete_insns_since (last);
>       insn = NULL;
>     }
>   return insn;
> 
> The aarch64's problem is we don't define addptr3 pattern, and we don't
> have direct insn pattern describing the "x + y << z".  According to
> gcc internal:
> 
> ‘addptrm3’
> Like addm3 but is guaranteed to only be used for address calculations.
> The expanded code is not allowed to clobber the condition code. It
> only needs to be defined if addm3 sets the condition code.

addm3 on aarch64 does not set the condition codes, so by this rule we
shouldn't need to define this pattern.

R.

> If adds
> used for address calculations and normal adds are not compatible it is
> required to expand a distinct pattern (e.g. using an unspec). The
> pattern is used by LRA to emit address calculations. addm3 is used if
> addptrm3 is not defined.
> 
> Seems to me aarch64 needs to support "add(shifted register)" in add3
> pattern, or needs to support addptr3 pattern.  I tend to apply this
> patch to expose the bug, but I will let aarch64 experts give some
> comments before that.
> 
> Thanks,
> bin
> 

Reply via email to