Richard Biener <rguent...@suse.de> writes: > On Thu, 20 Jul 2023, Richard Sandiford wrote: > >> Tamar Christina <tamar.christ...@arm.com> writes: >> > Hi All, >> > >> > The resulting predicate register of a whilelo is not >> > restricted to the lower half of the predicate register file. >> > >> > As such these tests started failing after recent changes >> > because the whilelo outside the loop is getting assigned p15. >> >> It's the whilelo in the loop for me. We go from: >> >> .L3: >> ld1b z31.b, p7/z, [x4, x3] >> movprfx z30, z31 >> mul z30.b, p5/m, z30.b, z29.b >> st1b z30.b, p7, [x4, x3] >> mov p6.b, p7.b >> add x3, x3, x0 >> whilelo p7.b, w3, w1 >> b.any .L3 >> >> to: >> >> .L3: >> ld1b z31.b, p7/z, [x3, x2] >> movprfx z29, z31 >> mul z29.b, p6/m, z29.b, z30.b >> st1b z29.b, p7, [x3, x2] >> add x2, x2, x0 >> whilelo p15.b, w2, w1 >> b.any .L4 >> [...] >> .p2align 2,,3 >> .L4: >> mov p7.b, p15.b >> b .L3 >> >> This adds an extra (admittedly unconditional) branch to every non-final >> vector iteration, which seems unfortunate. I don't think we'd see >> p8-p15 otherwise, since the result of the whilelo is used as a >> governing predicate by the next iteration of the loop. >> >> This happens because the scalar loop is given an 89% chance of iterating. >> Previously we gave the vector loop an 83.33% chance of iterating, whereas >> after 061f74c06735e1fa35b910ae we give it a 12% chance. 0.89^16 == 15.50%, >> so the new probabilities definitely preserve the original probabilities >> more closely. But for purely heuristic probabilities like these, I'm >> not sure we should lean so heavily into the idea that the vector >> latch is unlikely. >> >> Honza, Richi, any thoughts? Just wanted to double-check that this >> was operating as expected before making the tests accept the (arguably) >> less efficient code. It looks like the commit was more aimed at fixing >> the profile counts for the epilogues, rather than the main loop. > > The above looks like a failed coalescing, can you track down where > that happens and why?
Ah, sorry, I shouldn't have trimmed the context. The previous predicate (p6 in the original code) is live on exit from the loop, while the whilelo result is live on the latch edge. So I think a move is needed somewhere. Thanks, Richard