https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124097
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- But if the predicate is required anyway I would have expected the predicated add to be faster. That is, it's not the predicated add that is bad but the predicate generation. This would also depend on the target, so match.pd might not be the best place to perform this "optimization". On Zen with AVX512 the compare to %k register also has comparatively high latency (it's slower than the AVX2 compare to %xmm)
