Tamar Christina <tamar.christ...@arm.com> writes: >> >> So my gut instinct is that we should instead tweak the condition for >> using latency costs, but I'll need to think about it more when I get >> back from holiday. >> > > I think that's a separate problem.. From first principals it should already > be very wrong to compare the scalar loop to an iteration count it will > *NEVER* reach. So I don't understand why that would ever be valid.
But I don't think we're doing that, or at least, not as the final result. Instead, we first calculate the minimum number of vector iterations for which the vector loop is sometimes profitable. If this is N, then we're saying that the vector code is better than the scalar code for N*VF iterations. Like you say, this part ignores whether N*VF is actually achievable. But then: /* Now that we know the minimum number of vector iterations, find the minimum niters for which the scalar cost is larger: SIC * niters > VIC * vniters + VOC - SOC We know that the minimum niters is no more than vniters * VF + NPEEL, but it might be (and often is) less than that if a partial vector iteration is cheaper than the equivalent scalar code. */ int threshold = (vec_inside_cost * min_vec_niters + vec_outside_cost - scalar_outside_cost); if (threshold <= 0) min_profitable_iters = 1; else min_profitable_iters = threshold / scalar_single_iter_cost + 1; calculates which number of iterations in the range [(N-1)*VF + 1, N*VF] is the first to be profitable. This is specifically taking partial iterations into account and includes the N==1 case. The lower niters is, the easier it is for the scalar code to win. This is what is printed as: Calculated minimum iters for profitability: 7 So we think that vectorisation should be rejected if the loop count is <= 6, but accepted if it's >= 7. So I think the costing framework is set up to handle niters<VF correctly on first principles. It's "just" that the numbers being fed in give the wrong answer in this case. Thanks, Richard