https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796
--- Comment #7 from Michael Matz <matz at gcc dot gnu.org> --- Created attachment 46675 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675&action=edit potential patch Actually I was barking up the wrong tree. It's not as easy as the CFG manipulation for loop fusion going wrong (like missing some last iterations or so). It's really a problem in the dependence analysis. See the extensive comment in the patch. The fun thing is, there's a difference between these two loop nests: for (i) for (j) a[i][0] = f(a[i+1][0]); for (i) for (j) b[i][j] = f(a[i+1][j]); Even though the distance vector for the read/write in the single statement is (-1,0) for both loops, unroll-and-jam is valid for the second but not for the first.