https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94828
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Loop fusion is not |Loop fusion is not |implemented |implemented outside of ISL --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Both loops are vectorized: > ./cc1 -quiet y.c -O3 -fopt-info-vec y.c:6:11: optimized: loop vectorized using 16 byte vectors y.c:3:7: optimized: loop vectorized using 16 byte vectors GCC fuses the loops with -floop-nest-optimize [scheduler] original ast: { for (int c0 = 0; c0 < P_20; c0 += 1) S_3(c0); for (int c0 = 0; c0 < P_20; c0 += 1) S_4(c0); } [scheduler] AST generated by isl: for (int c0 = 0; c0 < P_20; c0 += 1) { S_3(c0); S_4(c0); } producing .L4: movdqu (%rdi,%rax), %xmm0 movdqu (%rsi,%rax), %xmm2 paddd %xmm2, %xmm0 paddd %xmm2, %xmm0 movups %xmm0, (%rdi,%rax) addq $16, %rax cmpq %rdx, %rax jne .L4 but it's true that GCC does not implement classical loop fusion.