On Fri, Nov 29, 2013 at 12:46 PM, Yufeng Zhang <yufeng.zh...@arm.com> wrote: > On 11/29/13 07:52, Bin.Cheng wrote: >> >> On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng<amker.ch...@gmail.com> wrote: >>> >>> On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw<rearn...@arm.com> >>> wrote: >>>> >>>> On 18/09/13 10:15, bin.cheng wrote: >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>> ow...@gcc.gnu.org] On Behalf Of bin.cheng >>>>>> Sent: Monday, September 02, 2013 3:09 PM >>>>>> To: Richard Earnshaw >>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Richard Earnshaw >>>>>>> Sent: Thursday, August 29, 2013 9:06 PM >>>>>>> To: Bin Cheng >>>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM >>>>>>> >>>>>>> On 28/08/13 08:00, bin.cheng wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This patch refines scaled address expression on ARM. It supports >>>>>>>> "base+index*scale" in arm_legitimate_address_outer_p. It also tries >>>>>>>> to legitimize "base + index * scale + offset" with "reg<- base + >>>>>>>> offset; reg >>>>>>>> + index * scale" by introducing thumb2_legitimize_address. For now >>>>>>>> + function >>>>>>>> thumb2_legitimize_address is a kind of placeholder and just does the >>>>>>>> mentioned transformation by calling to try_multiplier_address. >>>>>>>> Hoping we can improve it in the future. >>>>>>>> >>>>>>>> With this patch: >>>>>>>> 1) "base+index*scale" is recognized. >>>>>>> >>>>>>> >>>>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical >>>>>>> form. >>>>>>> So this shouldn't be necessary. Can you identify where this >>>>>> >>>>>> non-canoncial form is being generated? >>>>>>> >>>>>>> >>>>>> >>>>>> Oh, for now ivopt constructs "index*scale" to test whether backend >>>>>> supports scaled addressing mode, which is not valid on ARM, so I was >>>>>> going >>>>>> to construct "base + index*scale" instead. Since "base + index * >>>>>> scale" >>>>> >>>>> is not >>>>>> >>>>>> canonical form, I will construct the canonical form and drop this part >>>>>> of >>>>> >>>>> the >>>>>> >>>>>> patch. >>>>>> >>>>>> Is rest of this patch OK? >>>>>> >>>>> Hi Richard, I removed the part over which you concerned and created >>>>> this >>>>> updated patch. >>>>> >>>>> Is it OK? >>>>> >>>>> Thanks. >>>>> bin >>>>> >>>>> 2013-09-18 Bin Cheng<bin.ch...@arm.com> >>>>> >>>>> * config/arm/arm.c (try_multiplier_address): New function. >>>>> (thumb2_legitimize_address): New function. >>>>> (arm_legitimize_address): Call try_multiplier_address and >>>>> thumb2_legitimize_address. >>>>> >>>>> >>>>> 6-arm-scaled_address-20130918.txt >>>>> >>>>> >>>>> Index: gcc/config/arm/arm.c >>>>> =================================================================== >>>>> --- gcc/config/arm/arm.c (revision 200774) >>>>> +++ gcc/config/arm/arm.c (working copy) >>>>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg) >>>>> } >>>>> } >>>>> >>>>> +/* Try to find address expression like base + index * scale + offset >>>>> + in X. If we find one, force base + offset into register and >>>>> + construct new expression reg + index * scale; return the new >>>>> + address expression if it's valid. Otherwise return X. */ >>>>> +static rtx >>>>> +try_multiplier_address (rtx x, enum machine_mode mode >>>>> ATTRIBUTE_UNUSED) >>>>> +{ >>>>> + rtx tmp, base_reg, new_rtx; >>>>> + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = >>>>> NULL_RTX; >>>>> + >>>>> + gcc_assert (GET_CODE (x) == PLUS); >>>>> + >>>>> + /* Try to find and record base/index/scale/offset in X. */ >>>>> + if (GET_CODE (XEXP (x, 1)) == MULT) >>>>> + { >>>>> + tmp = XEXP (x, 0); >>>>> + index = XEXP (XEXP (x, 1), 0); >>>>> + scale = XEXP (XEXP (x, 1), 1); >>>>> + if (GET_CODE (tmp) != PLUS) >>>>> + return x; >>>>> + >>>>> + base = XEXP (tmp, 0); >>>>> + offset = XEXP (tmp, 1); >>>>> + } >>>>> + else >>>>> + { >>>>> + tmp = XEXP (x, 0); >>>>> + offset = XEXP (x, 1); >>>>> + if (GET_CODE (tmp) != PLUS) >>>>> + return x; >>>>> + >>>>> + base = XEXP (tmp, 0); >>>>> + scale = XEXP (tmp, 1); >>>>> + if (GET_CODE (base) == MULT) >>>>> + { >>>>> + tmp = base; >>>>> + base = scale; >>>>> + scale = tmp; >>>>> + } >>>>> + if (GET_CODE (scale) != MULT) >>>>> + return x; >>>>> + >>>>> + index = XEXP (scale, 0); >>>>> + scale = XEXP (scale, 1); >>>>> + } >>>>> + >>>>> + if (CONST_INT_P (base)) >>>>> + { >>>>> + tmp = base; >>>>> + base = offset; >>>>> + offset = tmp; >>>>> + } >>>>> + >>>>> + if (CONST_INT_P (index)) >>>>> + { >>>>> + tmp = index; >>>>> + index = scale; >>>>> + scale = tmp; >>>>> + } >>>>> + >>>>> + /* ARM only supports constant scale in address. */ >>>>> + if (!CONST_INT_P (scale)) >>>>> + return x; >>>>> + >>>>> + if (GET_MODE (base) != SImode || GET_MODE (index) != SImode) >>>>> + return x; >>>>> + >>>>> + /* Only register/constant are allowed in each part. */ >>>>> + if (!symbol_mentioned_p (base) >>>>> +&& !symbol_mentioned_p (offset) >>>>> +&& !symbol_mentioned_p (index) >>>>> +&& !symbol_mentioned_p (scale)) >>>>> + { >>>> >>>> >>>> It would be easier to do this at the top of the function -- >>>> if (symbol_mentioned_p (x)) >>>> return x; >>>> >>>> >>>>> + /* Force "base+offset" into register and construct >>>>> + "register+index*scale". Return the new expression >>>>> + only if it's valid. */ >>>>> + tmp = gen_rtx_PLUS (SImode, base, offset); >>>>> + base_reg = force_reg (SImode, tmp); >>>>> + tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale); >>>>> + new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp); >>>>> + return new_rtx; >>>> >>>> >>>> I can't help thinking that this is backwards. That is, you want to >>>> split out the mult expression and use offset addressing in the addresses >>>> itself. That's likely to lead to either better CSE, or more induction >>> >>> Thanks to your review. >>> >>> Actually base+offset is more likely loop invariant than scaled >>> expression, just as reported in pr57540. The bug is observed in >>> spec2k bzip/gzip, and hurts arm in hot loops. The loop induction >>> variable doesn't matter that much comparing to invariant because we >>> are in RTL now. >>> >>>> vars. Furthermore, scaled addressing modes are likely to be more >>>> expensive than simple offset addressing modes on at least some cores. >>> >>> The purpose is to CSE (as you pointed out) or hoist loop invariant. >>> As for addressing mode is concerned, Though we may guide the >>> transformation once we have reliable address cost mode, we don't have >>> the information if base+offset is invariant, so it's difficult to >>> handle in backend, right? >>> >>> What do you think about this? >>> >> >> Additionally, there is no induction variable issue here. The memory >> reference we are facing during expand are not lowered by gimple IVOPT, >> which means either it's outside loop, or doesn't have induction >> variable address. >> >> Generally, there are three passes which lowers memory reference: >> gimple strength reduction picks opportunity outside loop; gimple IVOPT >> handles references with induction variable addresses; references not >> handled by SLSR/IVOPT are handled by RTL expand, which makes bad >> choice of address re-association. I think Yufeng also noticed the >> problem and are trying with patch like: >> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html >> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html > > > Yes, when it comes to addressing expression, the re-association in RTL > expand is quite sensitive to the available address modes on the target and > its address legitimization routines. Both patches I proposed try to make > the RTL expansion more canonicalized on addressing expressions, especially > on ARM. It is done by essentially enabling simplify_gen_binary () to work > on a bigger RTX node. > > >> After thinking twice, I some kind of think we should not re-associate >> addresses during expanding, because of lacking of context information. >> Take base + scaled_index + offset as an example in PR57540, we just >> don't know if "base+offset" is loop invariant from either backend or >> RTL expander. > > > I'm getting less convinced by re-associating base with offset > unconditionally. One counter example is > > typedef int arr_1[20]; > void foo (arr_1 a1, int i) > { > a1[i+10] = 1; > } > > I'm experimenting a patch to get the immediate offset in the above example > to be the last addend in the address computing (as mentioned in > http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the > following code-gen: > > add r1, r0, r1, asl #2 > mov r3, #1 > str r3, [r1, #40] > > With your patch applied, the effort will be reverted to > > add r0, r0, #40 > mov r3, #1 > str r3, [r0, r1, asl #2] > > > I briefly looked into PR57540. I noticed that you are trying to tackle a > loop invariant in a hot loop: > > .L5: > add lr, sp, #2064 ////loop invariant > add r2, r2, #1 > add r3, lr, r3, asl #2 > ldr r3, [r3, #-2064] > cmp r3, #0 > bge .L5 > uxtb r2, r2
Why does RTL invariant motion not move it? Richard. > Looking into the example code: > > void > foo () > { > int parent [ 258 * 2 ]; > for (i = 1; i <= alphaSize; i++) > { > while (parent[k] >= 0) > { > k = parent[k]; > ... > } > ... > > The loop invariant is part of address computing for a stack variable. The > immediate 2064 is the offset of the variable on the stack frame rather than > what we normally expect, e.g. part of indexing; it is a back-end specific > issue and there is nothing the mid-end can do (the mem_ref lowering cannot > help either). One possible solution may be to force the base address of an > array on stack to a REG, before the RTL expand does anything 'smart'. Is it > something you think worth giving a try? > > Just my two cents. > > Thanks, > Yufeng >