On Fri, Nov 29, 2013 at 12:03 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, Nov 29, 2013 at 6:44 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, Nov 29, 2013 at 8:52 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw <rearn...@arm.com> wrote: >>>>> On 18/09/13 10:15, bin.cheng wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>>> ow...@gcc.gnu.org] On Behalf Of bin.cheng >>>>>>> Sent: Monday, September 02, 2013 3:09 PM >>>>>>> To: Richard Earnshaw >>>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Richard Earnshaw >>>>>>>> Sent: Thursday, August 29, 2013 9:06 PM >>>>>>>> To: Bin Cheng >>>>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM >>>>>>>> >>>>>>>> On 28/08/13 08:00, bin.cheng wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> This patch refines scaled address expression on ARM. It supports >>>>>>>>> "base+index*scale" in arm_legitimate_address_outer_p. It also tries >>>>>>>>> to legitimize "base + index * scale + offset" with "reg <- base + >>>>>>>>> offset; reg >>>>>>>>> + index * scale" by introducing thumb2_legitimize_address. For now >>>>>>>>> + function >>>>>>>>> thumb2_legitimize_address is a kind of placeholder and just does the >>>>>>>>> mentioned transformation by calling to try_multiplier_address. >>>>>>>>> Hoping we can improve it in the future. >>>>>>>>> >>>>>>>>> With this patch: >>>>>>>>> 1) "base+index*scale" is recognized. >>>>>>>> >>>>>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form. >>>>>>>> So this shouldn't be necessary. Can you identify where this >>>>>>> non-canoncial form is being generated? >>>>>>>> >>>>>>> >>>>>>> Oh, for now ivopt constructs "index*scale" to test whether backend >>>>>>> supports scaled addressing mode, which is not valid on ARM, so I was >>>>>>> going >>>>>>> to construct "base + index*scale" instead. Since "base + index * scale" >>>>>> is not >>>>>>> canonical form, I will construct the canonical form and drop this part >>>>>>> of >>>>>> the >>>>>>> patch. >>>>>>> >>>>>>> Is rest of this patch OK? >>>>>>> >>>>>> Hi Richard, I removed the part over which you concerned and created this >>>>>> updated patch. >>>>>> >>>>>> Is it OK? >>>>>> >>>>>> Thanks. >>>>>> bin >>>>>> >>>>>> 2013-09-18 Bin Cheng <bin.ch...@arm.com> >>>>>> >>>>>> * config/arm/arm.c (try_multiplier_address): New function. >>>>>> (thumb2_legitimize_address): New function. >>>>>> (arm_legitimize_address): Call try_multiplier_address and >>>>>> thumb2_legitimize_address. >>>>>> >>>>>> >>>>>> 6-arm-scaled_address-20130918.txt >>>>>> >>>>>> >>>>>> Index: gcc/config/arm/arm.c >>>>>> =================================================================== >>>>>> --- gcc/config/arm/arm.c (revision 200774) >>>>>> +++ gcc/config/arm/arm.c (working copy) >>>>>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg) >>>>>> } >>>>>> } >>>>>> >>>>>> +/* Try to find address expression like base + index * scale + offset >>>>>> + in X. If we find one, force base + offset into register and >>>>>> + construct new expression reg + index * scale; return the new >>>>>> + address expression if it's valid. Otherwise return X. */ >>>>>> +static rtx >>>>>> +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED) >>>>>> +{ >>>>>> + rtx tmp, base_reg, new_rtx; >>>>>> + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = >>>>>> NULL_RTX; >>>>>> + >>>>>> + gcc_assert (GET_CODE (x) == PLUS); >>>>>> + >>>>>> + /* Try to find and record base/index/scale/offset in X. */ >>>>>> + if (GET_CODE (XEXP (x, 1)) == MULT) >>>>>> + { >>>>>> + tmp = XEXP (x, 0); >>>>>> + index = XEXP (XEXP (x, 1), 0); >>>>>> + scale = XEXP (XEXP (x, 1), 1); >>>>>> + if (GET_CODE (tmp) != PLUS) >>>>>> + return x; >>>>>> + >>>>>> + base = XEXP (tmp, 0); >>>>>> + offset = XEXP (tmp, 1); >>>>>> + } >>>>>> + else >>>>>> + { >>>>>> + tmp = XEXP (x, 0); >>>>>> + offset = XEXP (x, 1); >>>>>> + if (GET_CODE (tmp) != PLUS) >>>>>> + return x; >>>>>> + >>>>>> + base = XEXP (tmp, 0); >>>>>> + scale = XEXP (tmp, 1); >>>>>> + if (GET_CODE (base) == MULT) >>>>>> + { >>>>>> + tmp = base; >>>>>> + base = scale; >>>>>> + scale = tmp; >>>>>> + } >>>>>> + if (GET_CODE (scale) != MULT) >>>>>> + return x; >>>>>> + >>>>>> + index = XEXP (scale, 0); >>>>>> + scale = XEXP (scale, 1); >>>>>> + } >>>>>> + >>>>>> + if (CONST_INT_P (base)) >>>>>> + { >>>>>> + tmp = base; >>>>>> + base = offset; >>>>>> + offset = tmp; >>>>>> + } >>>>>> + >>>>>> + if (CONST_INT_P (index)) >>>>>> + { >>>>>> + tmp = index; >>>>>> + index = scale; >>>>>> + scale = tmp; >>>>>> + } >>>>>> + >>>>>> + /* ARM only supports constant scale in address. */ >>>>>> + if (!CONST_INT_P (scale)) >>>>>> + return x; >>>>>> + >>>>>> + if (GET_MODE (base) != SImode || GET_MODE (index) != SImode) >>>>>> + return x; >>>>>> + >>>>>> + /* Only register/constant are allowed in each part. */ >>>>>> + if (!symbol_mentioned_p (base) >>>>>> + && !symbol_mentioned_p (offset) >>>>>> + && !symbol_mentioned_p (index) >>>>>> + && !symbol_mentioned_p (scale)) >>>>>> + { >>>>> >>>>> It would be easier to do this at the top of the function -- >>>>> if (symbol_mentioned_p (x)) >>>>> return x; >>>>> >>>>> >>>>>> + /* Force "base+offset" into register and construct >>>>>> + "register+index*scale". Return the new expression >>>>>> + only if it's valid. */ >>>>>> + tmp = gen_rtx_PLUS (SImode, base, offset); >>>>>> + base_reg = force_reg (SImode, tmp); >>>>>> + tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale); >>>>>> + new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp); >>>>>> + return new_rtx; >>>>> >>>>> I can't help thinking that this is backwards. That is, you want to >>>>> split out the mult expression and use offset addressing in the addresses >>>>> itself. That's likely to lead to either better CSE, or more induction >>>> Thanks to your review. >>>> >>>> Actually base+offset is more likely loop invariant than scaled >>>> expression, just as reported in pr57540. The bug is observed in >>>> spec2k bzip/gzip, and hurts arm in hot loops. The loop induction >>>> variable doesn't matter that much comparing to invariant because we >>>> are in RTL now. >>>> >>>>> vars. Furthermore, scaled addressing modes are likely to be more >>>>> expensive than simple offset addressing modes on at least some cores. >>>> The purpose is to CSE (as you pointed out) or hoist loop invariant. >>>> As for addressing mode is concerned, Though we may guide the >>>> transformation once we have reliable address cost mode, we don't have >>>> the information if base+offset is invariant, so it's difficult to >>>> handle in backend, right? >>>> >>>> What do you think about this? >>>> >>> >>> Additionally, there is no induction variable issue here. The memory >>> reference we are facing during expand are not lowered by gimple IVOPT, >>> which means either it's outside loop, or doesn't have induction >>> variable address. >>> >>> Generally, there are three passes which lowers memory reference: >>> gimple strength reduction picks opportunity outside loop; gimple IVOPT >>> handles references with induction variable addresses; references not >>> handled by SLSR/IVOPT are handled by RTL expand, which makes bad >>> choice of address re-association. I think Yufeng also noticed the >>> problem and are trying with patch like: >>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html >>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html >>> >>> After thinking twice, I some kind of think we should not re-associate >>> addresses during expanding, because of lacking of context information. >>> Take base + scaled_index + offset as an example in PR57540, we just >>> don't know if "base+offset" is loop invariant from either backend or >>> RTL expander. >>> >>> Any comments? >> >> The issue is that re-association and address lowering is not integrated >> with optimizers such as CSE and invariant motion. This means it's >> hard to arrive at optimal results and all passes are just "guessing" >> and applying heuristics that the programmer thought are appropriate >> for his machine. >> >> I have no easy way out but think that we lower addresses too late >> (at least fully). Loop optimizers would like to see the complex >> memory reference forms but starting with IVOPTs lowering all >> addresses would likely be a win in the end (there are CSE, LIM >> and re-association passes around after/at this point as well). >> >> Back in the 4.3 days when I for the first time attempted to introduce >> MEM_REF I forcefully lowered all memory accesses and addresses >> very early (during gimplification basically). That didn't work out well. >> >> So my suggestion to anyone who wants to try something is >> to apply an additional lowering to the whole GIMPLE, lowering >> things like handled-component references and addresses >> (and also things like bitfield accesses). > Yes, once in expander, there will be no enough information. Since we > don't need to handle references with induction variable addresses > after IVOPT, it's not difficult to take invariant into consideration > when re-associating. Yet I have no clue how CSE should be considered.
You'd need to detect that (a + b)*c can be computed differently if both a*c and b*c are available for example. There is even a PR about this somewhere. > Nevertheless, force "base+scaled*index+offset" into register and use > reg directed addressing mode like PR57540 in expand is not good. Sure ... this is why we have IVOPTs and SLSR. But both would work ok with lowered input I think and would simply build up a "leveled" IL suitable for the target. IVOPTs also does some simple CSE considerations AFAIK. In the end what matters for speed is loops after all ;) Richard. > Thanks, > bin > > -- > Best Regards.