On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
<[email protected]> wrote:
> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>> Hi,
>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>> expressions like below:
>> FORM1: "r73 + r74 + 16380"
>> FORM2: "r73 << 2 + r74 + 16380"
>>
>> They are invalid address expression on AArch64, so will be legitimized by
>> aarch64_legitimize_address. Below are what we got from that function:
>>
>> For FORM1, the address expression is legitimized into below insn sequence
>> and rtx:
>> r84:DI=r73:DI+r74:DI
>> r85:DI=r84:DI+0x3000
>> r83:DI=r85:DI
>> "r83 + 4092"
>>
>> For FORM2, the address expression is legitimized into below insn sequence
>> and rtx:
>> r108:DI=r73:DI<<0x2
>> r109:DI=r108:DI+r74:DI
>> r110:DI=r109:DI+0x3000
>> r107:DI=r110:DI
>> "r107 + 4092"
>>
>> So the costs computed are 12/16 respectively. The high cost prevents IVO
>> from choosing right candidates. Besides cost computation, I also think the
>> legitmization is bad in terms of code generation.
>> The root cause in aarch64_legitimize_address can be described by it's
>> comment:
>> /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>> where mask is selected by alignment and size of the offset.
>> We try to pick as large a range for the offset as possible to
>> maximize the chance of a CSE. However, for aligned addresses
>> we limit the range to 4k so that structures with different sized
>> elements are likely to use the same base. */
>> I think the split of CONST is intended for REG+CONST where the const offset
>> is not in the range of AArch64's addressing modes. Unfortunately, it
>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
>> when the CONST are in the range of addressing modes. As a result, these two
>> cases fallthrough this logic, resulting in sub-optimal results.
>>
>> It's obvious we can do below legitimization:
>> FORM1:
>> r83:DI=r73:DI+r74:DI
>> "r83 + 16380"
>> FORM2:
>> r107:DI=0x3ffc
>> r106:DI=r74:DI+r107:DI
>> REG_EQUAL r74:DI+0x3ffc
>> "r106 + r73 << 2"
>>
>> This patch handles these two cases as described.
>
> Thanks for the description, it made the patch very easy to review. I only
> have a style comment.
>
>> Bootstrap & test on AArch64 along with other patch. Is it OK?
>>
>> 2015-11-04 Bin Cheng <[email protected]>
>> Jiong Wang <[email protected]>
>>
>> * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>> address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 5c8604f..47875ac 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */,
>> machine_mode mode)
>> {
>> HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>> HOST_WIDE_INT base_offset;
>> + rtx op0 = XEXP (x,0);
>> +
>> + if (GET_CODE (op0) == PLUS)
>> + {
>> + rtx op0_ = XEXP (op0, 0);
>> + rtx op1_ = XEXP (op0, 1);
>
> I don't see this trailing _ on a variable name in many places in the source
> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
> Can we pick a different name for op0_ and op1_?
>
>> +
>> + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>> + reach here, the 'CONST' may be valid in which case we should
>> + not split. */
>> + if (REG_P (op0_) && REG_P (op1_))
>> + {
>> + machine_mode addr_mode = GET_MODE (op0);
>> + rtx addr = gen_reg_rtx (addr_mode);
>> +
>> + rtx ret = plus_constant (addr_mode, addr, offset);
>> + if (aarch64_legitimate_address_hook_p (mode, ret, false))
>> + {
>> + emit_insn (gen_adddi3 (addr, op0_, op1_));
>> + return ret;
>> + }
>> + }
>> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>> + will reach here. If (PLUS REG, NON_REG) is valid addr expr,
>> + we split it into Y=REG+CONST, Y+NON_REG. */
>> + else if (REG_P (op0_) || REG_P (op1_))
>> + {
>> + machine_mode addr_mode = GET_MODE (op0);
>> + rtx addr = gen_reg_rtx (addr_mode);
>> +
>> + /* Switch to make sure that register is in op0_. */
>> + if (REG_P (op1_))
>> + std::swap (op0_, op1_);
>> +
>> + rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>> + if (aarch64_legitimate_address_hook_p (mode, ret, false))
>> + {
>> + addr = force_operand (plus_constant (addr_mode,
>> + op0_, offset),
>> + NULL_RTX);
>> + ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>> + return ret;
>> + }
>
> The logic here is a bit hairy to follow, you construct a PLUS RTX to check
> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
> to use as the return value. This can probably be clarified by choosing a
> name other than ret for the temporary address expression you construct.
>
> It would also be good to take some of your detailed description and write
> that here. Certainly I found the explicit examples in the cover letter
> easier to follow than:
>
>> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>> + will reach here. If (PLUS REG, NON_REG) is valid addr expr,
>> + we split it into Y=REG+CONST, Y+NON_REG. */
>
> Otherwise this patch is OK.
Thanks for reviewing, here is the updated patch.
Thanks,
bin
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..64bc6a4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4704,13 +4704,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */,
machine_mode mode)
We try to pick as large a range for the offset as possible to
maximize the chance of a CSE. However, for aligned addresses
we limit the range to 4k so that structures with different sized
- elements are likely to use the same base. */
+ elements are likely to use the same base. We need to be careful
+ not split CONST for some forms address expressions, otherwise it
+ will generate sub-optimal code. */
if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
{
HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
HOST_WIDE_INT base_offset;
+ if (GET_CODE (XEXP (x, 0)) == PLUS)
+ {
+ rtx op0 = XEXP (XEXP (x, 0), 0);
+ rtx op1 = XEXP (XEXP (x, 0), 1);
+
+ /* For addr expression in the form like "r1 + r2 + 0x3ffc".
+ Since the offset is within range supported by addressing
+ mode "reg+offset", we don't split the const and legalize
+ it into below insn and expr sequence:
+ r3 = r1 + r2;
+ "r3 + 0x3ffc". */
+ if (REG_P (op0) && REG_P (op1))
+ {
+ machine_mode addr_mode = GET_MODE (x);
+ rtx base = gen_reg_rtx (addr_mode);
+ rtx addr = plus_constant (addr_mode, base, offset);
+
+ if (aarch64_legitimate_address_hook_p (mode, addr, false))
+ {
+ emit_insn (gen_adddi3 (base, op0, op1));
+ return addr;
+ }
+ }
+ /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
+ Live above, we don't split the const and legalize it into
+ below insn and expr sequence:
+ r3 = 0x3ffc;
+ r4 = r1 + r3;
+ "r4 + r2<<2". */
+ else if (REG_P (op0) || REG_P (op1))
+ {
+ machine_mode addr_mode = GET_MODE (x);
+ rtx base = gen_reg_rtx (addr_mode);
+
+ /* Switch to make sure that register is in op0. */
+ if (REG_P (op1))
+ std::swap (op0, op1);
+
+ rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+
+ if (aarch64_legitimate_address_hook_p (mode, addr, false))
+ {
+ base = force_operand (plus_constant (addr_mode,
+ op0, offset),
+ NULL_RTX);
+ return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+ }
+ }
+ }
+
/* Does it look like we'll need a load/store-pair operation? */
if (GET_MODE_SIZE (mode) > 16
|| mode == TImode)