"Kewen.Lin" <li...@linux.ibm.com> writes:
> on 2020/6/3 下午5:27, Richard Biener wrote:
>> On Wed, 3 Jun 2020, Kewen.Lin wrote:
>> 
>>> on 2020/6/3 下午3:07, Richard Biener wrote:
>>>> On Wed, 3 Jun 2020, Kewen.Lin wrote:
>>>>
>>>>> Hi Richi,
>>>>>
>
> snip ...
>
>>>>>>
>>>>>> I'd just mention there are other targets that have the choice between
>>>>>> the above forms.  Since IVOPTs itself does not perform the unrolling
>>>>>> the IL it produces is the same, correct?
>>>>>>
>>>>> Yes.  Before this patch, IVOPTs doesn't consider the unrolling impacts,
>>>>> it only models things based on what it sees.  We can assume it thinks
>>>>> later RTL unrolling won't perform.
>>>>>
>>>>> With this patch, since the IV choice probably changes, the IL can probably
>>>>> change.  The typical difference with this patch is:
>>>>>
>>>>>   vect__1.7_15 = MEM[symbol: x, index: ivtmp.19_22, offset: 0B];
>>>>> vs.
>>>>>   vect__1.7_15 = MEM[base: _29, offset: 0B];
>>>>
>>>> So we're asking IVOPTS "if we were unrolling this loop would you make
>>>> a different IV choice?" thus I wonder why we need so much complexity
>>>> here?  
>>>
>>> I would describe it more like "we are going to unroll this loop with
>>> unroll factor uf in RTL, would you consider this variable when modeling?"
>>>
>>> In most cases, one single iteration is representative for the unrolled
>>> body, so it doesn't matter considering unrolling or not.  But for the
>>> case here, it's not true, expected reg_offset iv cand can make iv cand
>>> step cost reduced, it leads the difference.
>>>
>>>> That is, if we can classify the loop as being possibly unrolled
>>>> we could evaluate IVOPTs IV choice (and overall cost) on the original
>>>> loop and in a second run on the original loop with fake IV uses
>>>> added with extra offset.  If the overall IV cost is similar we'll
>>>> take the unroll friendly choice if the costs are way different
>>>> (I wouldn't expect this to be the case ever?) I'd side with the
>>>> IV choice when not unrolling (and mark the loop as to be not unrolled).
>>>>
>>>
>>> Could you elaborate it a bit?  I guess it won't estimate the unroll
>>> factor here, just guess it's to be unrolled or not?  The second run
>>> with fake IV uses added with extra offset sounds like scaling up the 
>>> iv group cost by uf.
>> 
>> From your example above the D-form (MEM[symbol: x, index: ivtmp.19_22, 
>> offset: 0B]) is preferable since in the unrolled variant we have
>> the same addres but with a different constant offset for the unroll
>> copies while the second form would have to update the 'base' IV.
>> 
>> Thus I think the difference in IV cost and decision should already
>> show up if we, for each USE add a USE with an added constant offset.
>> This might be what your patch does with that extra flag on the USEs,
>> I was suggesting to model the USEs more explicitely, simulating a
>> 2-way unroll.  I think in the end I'll defer to Bin here who knows
>> the code best.
>> 
>
> Thanks for your further explanation!  As your proposal we introduce more
> iv use groups with step added.  Take the example here
> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547128.html
> Imagining initially the cand iv 4 leading to x-form wins, it's the
> original iv, has the iv-group cost 1 against the address group.
> Although we introduce one more group (2-way unrolling), the iv still
> wins since pulling the address iv in takes 5 (15 for three).  Probably
> we can introduce more groups according to uf here.

Yeah, to summarise that thread: the idea there was that we would
continue to cost each use once, but base the cost on the kind of address
seen in the unrolled iterations.  I guess this tends to over-estimate the
cost of index IVs to some extent, but I too was aiming for something
simple that doesn't depend on a specific unroll factor.

Kewen's point there was that that approach works for high unroll factors,
but not for small unroll factors like 2.  For:

  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride
  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride

using X as an IV is still preferred.  It's only once the unroll
factor exceeds the number of pointer IVs that using pointer IVs
becomes better.

So like Kewen says, using 2 USEs (the original one and an unrolled one)
would have the opposite problem: it would still prefer index IVs and not
consider the benefit of pointer IVs at higher unroll factors.

But I agree that trying to guess what a much later pass will do doesn't
feel very clean either...

Thanks,
Richard

Reply via email to