"Kewen.Lin" <li...@linux.ibm.com> writes: > on 2020/6/3 下午5:27, Richard Biener wrote: >> On Wed, 3 Jun 2020, Kewen.Lin wrote: >> >>> on 2020/6/3 下午3:07, Richard Biener wrote: >>>> On Wed, 3 Jun 2020, Kewen.Lin wrote: >>>> >>>>> Hi Richi, >>>>> > > snip ... > >>>>>> >>>>>> I'd just mention there are other targets that have the choice between >>>>>> the above forms. Since IVOPTs itself does not perform the unrolling >>>>>> the IL it produces is the same, correct? >>>>>> >>>>> Yes. Before this patch, IVOPTs doesn't consider the unrolling impacts, >>>>> it only models things based on what it sees. We can assume it thinks >>>>> later RTL unrolling won't perform. >>>>> >>>>> With this patch, since the IV choice probably changes, the IL can probably >>>>> change. The typical difference with this patch is: >>>>> >>>>> vect__1.7_15 = MEM[symbol: x, index: ivtmp.19_22, offset: 0B]; >>>>> vs. >>>>> vect__1.7_15 = MEM[base: _29, offset: 0B]; >>>> >>>> So we're asking IVOPTS "if we were unrolling this loop would you make >>>> a different IV choice?" thus I wonder why we need so much complexity >>>> here? >>> >>> I would describe it more like "we are going to unroll this loop with >>> unroll factor uf in RTL, would you consider this variable when modeling?" >>> >>> In most cases, one single iteration is representative for the unrolled >>> body, so it doesn't matter considering unrolling or not. But for the >>> case here, it's not true, expected reg_offset iv cand can make iv cand >>> step cost reduced, it leads the difference. >>> >>>> That is, if we can classify the loop as being possibly unrolled >>>> we could evaluate IVOPTs IV choice (and overall cost) on the original >>>> loop and in a second run on the original loop with fake IV uses >>>> added with extra offset. If the overall IV cost is similar we'll >>>> take the unroll friendly choice if the costs are way different >>>> (I wouldn't expect this to be the case ever?) I'd side with the >>>> IV choice when not unrolling (and mark the loop as to be not unrolled). >>>> >>> >>> Could you elaborate it a bit? I guess it won't estimate the unroll >>> factor here, just guess it's to be unrolled or not? The second run >>> with fake IV uses added with extra offset sounds like scaling up the >>> iv group cost by uf. >> >> From your example above the D-form (MEM[symbol: x, index: ivtmp.19_22, >> offset: 0B]) is preferable since in the unrolled variant we have >> the same addres but with a different constant offset for the unroll >> copies while the second form would have to update the 'base' IV. >> >> Thus I think the difference in IV cost and decision should already >> show up if we, for each USE add a USE with an added constant offset. >> This might be what your patch does with that extra flag on the USEs, >> I was suggesting to model the USEs more explicitely, simulating a >> 2-way unroll. I think in the end I'll defer to Bin here who knows >> the code best. >> > > Thanks for your further explanation! As your proposal we introduce more > iv use groups with step added. Take the example here > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547128.html > Imagining initially the cand iv 4 leading to x-form wins, it's the > original iv, has the iv-group cost 1 against the address group. > Although we introduce one more group (2-way unrolling), the iv still > wins since pulling the address iv in takes 5 (15 for three). Probably > we can introduce more groups according to uf here.
Yeah, to summarise that thread: the idea there was that we would continue to cost each use once, but base the cost on the kind of address seen in the unrolled iterations. I guess this tends to over-estimate the cost of index IVs to some extent, but I too was aiming for something simple that doesn't depend on a specific unroll factor. Kewen's point there was that that approach works for high unroll factors, but not for small unroll factors like 2. For: LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride using X as an IV is still preferred. It's only once the unroll factor exceeds the number of pointer IVs that using pointer IVs becomes better. So like Kewen says, using 2 USEs (the original one and an unrolled one) would have the opposite problem: it would still prefer index IVs and not consider the benefit of pointer IVs at higher unroll factors. But I agree that trying to guess what a much later pass will do doesn't feel very clean either... Thanks, Richard