https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200
--- Comment #1 from Robin Dapp <rdapp at gcc dot gnu.org> --- Took me a while to analyze this... needed more time than I'd like to admit to make sense of the somewhat weird code created by fully unrolling and peeling. I believe the problem is that we reload the output register of a vfmacc/fma via vmv.v.v (subject to length masking) but we should be using vmv1r.v. The result is used by a reduction which always operates on the full length. As annoying as it was to find - it's definitely a good catch. I'm testing a patch. PR114202 is indeed a duplicate. Going to add its test case to the patch.