Richard Biener <rguent...@suse.de> writes:
> On Thu, 20 Jul 2023, Richard Sandiford wrote:
>
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > The resulting predicate register of a whilelo is not
>> > restricted to the lower half of the predicate register file.
>> >
>> > As such these tests started failing after recent changes
>> > because the whilelo outside the loop is getting assigned p15.
>> 
>> It's the whilelo in the loop for me.  We go from:
>> 
>> .L3:
>>         ld1b    z31.b, p7/z, [x4, x3]
>>         movprfx z30, z31
>>         mul     z30.b, p5/m, z30.b, z29.b
>>         st1b    z30.b, p7, [x4, x3]
>>         mov     p6.b, p7.b
>>         add     x3, x3, x0
>>         whilelo p7.b, w3, w1
>>         b.any   .L3
>> 
>> to:
>> 
>> .L3:
>>         ld1b    z31.b, p7/z, [x3, x2]
>>         movprfx z29, z31
>>         mul     z29.b, p6/m, z29.b, z30.b
>>         st1b    z29.b, p7, [x3, x2]
>>         add     x2, x2, x0
>>         whilelo p15.b, w2, w1
>>         b.any   .L4
>>         [...]
>>         .p2align 2,,3
>> .L4:
>>         mov     p7.b, p15.b
>>         b       .L3
>> 
>> This adds an extra (admittedly unconditional) branch to every non-final
>> vector iteration, which seems unfortunate.  I don't think we'd see
>> p8-p15 otherwise, since the result of the whilelo is used as a
>> governing predicate by the next iteration of the loop.
>> 
>> This happens because the scalar loop is given an 89% chance of iterating.
>> Previously we gave the vector loop an 83.33% chance of iterating, whereas
>> after 061f74c06735e1fa35b910ae we give it a 12% chance.  0.89^16 == 15.50%,
>> so the new probabilities definitely preserve the original probabilities
>> more closely.  But for purely heuristic probabilities like these, I'm
>> not sure we should lean so heavily into the idea that the vector
>> latch is unlikely.
>> 
>> Honza, Richi, any thoughts?  Just wanted to double-check that this
>> was operating as expected before making the tests accept the (arguably)
>> less efficient code.  It looks like the commit was more aimed at fixing
>> the profile counts for the epilogues, rather than the main loop.
>
> The above looks like a failed coalescing, can you track down where
> that happens and why?

Ah, sorry, I shouldn't have trimmed the context.  The previous predicate
(p6 in the original code) is live on exit from the loop, while the
whilelo result is live on the latch edge.  So I think a move is needed
somewhere.

Thanks,
Richard

Reply via email to