Re: [PATCH]AArch64: Take into account when VF is higher than known scalar iters

Richard Sandiford Fri, 20 Sep 2024 07:48:16 -0700

Tamar Christina <tamar.christ...@arm.com> writes:
>> 
>> So my gut instinct is that we should instead tweak the condition for
>> using latency costs, but I'll need to think about it more when I get
>> back from holiday.
>> 
>
> I think that's a separate problem.. From first principals it should already
> be very wrong to compare the scalar loop to an iteration count it will
> *NEVER* reach.  So I don't understand why that would ever be valid.


But I don't think we're doing that, or at least, not as the final result.
Instead, we first calculate the minimum number of vector iterations for
which the vector loop is sometimes profitable.  If this is N, then we're
saying that the vector code is better than the scalar code for N*VF
iterations.  Like you say, this part ignores whether N*VF is actually
achievable.  But then:

          /* Now that we know the minimum number of vector iterations,
             find the minimum niters for which the scalar cost is larger:

             SIC * niters > VIC * vniters + VOC - SOC

             We know that the minimum niters is no more than
             vniters * VF + NPEEL, but it might be (and often is) less
             than that if a partial vector iteration is cheaper than the
             equivalent scalar code.  */
          int threshold = (vec_inside_cost * min_vec_niters
                           + vec_outside_cost
                           - scalar_outside_cost);
          if (threshold <= 0)
            min_profitable_iters = 1;
          else
            min_profitable_iters = threshold / scalar_single_iter_cost + 1;

calculates which number of iterations in the range [(N-1)*VF + 1, N*VF]
is the first to be profitable.  This is specifically taking partial
iterations into account and includes the N==1 case.  The lower niters is,
the easier it is for the scalar code to win.

This is what is printed as:

  Calculated minimum iters for profitability: 7

So we think that vectorisation should be rejected if the loop count
is <= 6, but accepted if it's >= 7.

So I think the costing framework is set up to handle niters<VF correctly
on first principles.  It's "just" that the numbers being fed in give the
wrong answer in this case.

Thanks,
Richard

Re: [PATCH]AArch64: Take into account when VF is higher than known scalar iters

Reply via email to