On Wed, 3 Jun 2020, Kewen.Lin wrote: > on 2020/6/3 下午3:07, Richard Biener wrote: > > On Wed, 3 Jun 2020, Kewen.Lin wrote: > > > >> Hi Richi, > >> > >> on 2020/6/2 下午7:38, Richard Biener wrote: > >>> On Thu, 28 May 2020, Kewen.Lin wrote: > >>> > >>>> Hi, > >>>> > >>>> This is one repost and you can refer to the original series > >>>> via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html. > >>>> > >>>> As we discussed in the thread > >>>> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html > >>>> Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, > >>>> I'm working to teach IVOPTs to consider D-form group access during > >>>> unrolling. > >>>> The difference on D-form and other forms during unrolling is we can put > >>>> the > >>>> stride into displacement field to avoid additional step increment. eg: > >>>> > >>>> With X-form (uf step increment): > >>>> ... > >>>> LD A = baseA, X > >>>> LD B = baseB, X > >>>> ST C = baseC, X > >>>> X = X + stride > >>>> LD A = baseA, X > >>>> LD B = baseB, X > >>>> ST C = baseC, X > >>>> X = X + stride > >>>> LD A = baseA, X > >>>> LD B = baseB, X > >>>> ST C = baseC, X > >>>> X = X + stride > >>>> ... > >>>> > >>>> With D-form (one step increment for each base): > >>>> ... > >>>> LD A = baseA, OFF > >>>> LD B = baseB, OFF > >>>> ST C = baseC, OFF > >>>> LD A = baseA, OFF+stride > >>>> LD B = baseB, OFF+stride > >>>> ST C = baseC, OFF+stride > >>>> LD A = baseA, OFF+2*stride > >>>> LD B = baseB, OFF+2*stride > >>>> ST C = baseC, OFF+2*stride > >>>> ... > >>>> baseA += stride * uf > >>>> baseB += stride * uf > >>>> baseC += stride * uf > >>>> > >>>> Imagining that if the loop get unrolled by 8 times, then 3 step updates > >>>> with > >>>> D-form vs. 8 step updates with X-form. Here we only need to check stride > >>>> meet D-form field requirement, since if OFF doesn't meet, we can > >>>> construct > >>>> baseA' with baseA + OFF. > >>> > >>> I'd just mention there are other targets that have the choice between > >>> the above forms. Since IVOPTs itself does not perform the unrolling > >>> the IL it produces is the same, correct? > >>> > >> Yes. Before this patch, IVOPTs doesn't consider the unrolling impacts, > >> it only models things based on what it sees. We can assume it thinks > >> later RTL unrolling won't perform. > >> > >> With this patch, since the IV choice probably changes, the IL can probably > >> change. The typical difference with this patch is: > >> > >> vect__1.7_15 = MEM[symbol: x, index: ivtmp.19_22, offset: 0B]; > >> vs. > >> vect__1.7_15 = MEM[base: _29, offset: 0B]; > > > > So we're asking IVOPTS "if we were unrolling this loop would you make > > a different IV choice?" thus I wonder why we need so much complexity > > here? > > I would describe it more like "we are going to unroll this loop with > unroll factor uf in RTL, would you consider this variable when modeling?" > > In most cases, one single iteration is representative for the unrolled > body, so it doesn't matter considering unrolling or not. But for the > case here, it's not true, expected reg_offset iv cand can make iv cand > step cost reduced, it leads the difference. > > > That is, if we can classify the loop as being possibly unrolled > > we could evaluate IVOPTs IV choice (and overall cost) on the original > > loop and in a second run on the original loop with fake IV uses > > added with extra offset. If the overall IV cost is similar we'll > > take the unroll friendly choice if the costs are way different > > (I wouldn't expect this to be the case ever?) I'd side with the > > IV choice when not unrolling (and mark the loop as to be not unrolled). > > > > Could you elaborate it a bit? I guess it won't estimate the unroll > factor here, just guess it's to be unrolled or not? The second run > with fake IV uses added with extra offset sounds like scaling up the > iv group cost by uf.
>From your example above the D-form (MEM[symbol: x, index: ivtmp.19_22, offset: 0B]) is preferable since in the unrolled variant we have the same addres but with a different constant offset for the unroll copies while the second form would have to update the 'base' IV. Thus I think the difference in IV cost and decision should already show up if we, for each USE add a USE with an added constant offset. This might be what your patch does with that extra flag on the USEs, I was suggesting to model the USEs more explicitely, simulating a 2-way unroll. I think in the end I'll defer to Bin here who knows the code best. > > Thus I'd err on the side of not unrolling but leave the ultimate choice > > of whether to unroll to RTL unless IV cost makes that prohibitive. > > > > Even without X- or D- form addressing modes the IV choice may differ > > and I think we don't need extra knobs for the unroller but instead > > can decide to set the existing n_unroll to zero (force not unroll) > > when costs say it would be bad? > > Yes, even without x- or d- form addressing, the difference probably comes > from compare type IV use for loop ending, maybe more cases which I am not > aware of. But I don't see people care about it, probably the impact is > small. > > IIUC what you stated here looks like to use ivopts information for unrolling > factor decision, I think this is a separate direction, do we have this > kind of case where ivopts costs can foresee the unrolling? > > Now the unroll factor estimation can be used for other optimization passes > if they are wondering future unrolling factor decision, as discussed it > sounds a good idea to override the n_unroll with some benchmarking. I didnt' suggest to use IVOPTs to determine the unroll factor. In fact your patch looks like it does this? Instead I wanted to make IVOPTs choose a set of IVs that is best for a blend of both worlds - use D-form when it doesn't hurt the not unrolled code [much], and X-form when the D-form is way worse (for whatever reason) and signal that to the unroller (but we could chose to not do that). The real issue is of course we're applying IV decision to a not final loop. > BR, > Kewen > > > > > Richard. > > > >> BR, > >> Kewen > >> > >>> Richard. > >>> > >> > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)