https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95219
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Coalesce list: (4)ivtmp.15_4 & (22)ivtmp.15_22 [map: 2, 9] : Success -> 2 Coalesce list: (1)vect_vec_iv_.7_1 & (19)_19 [map: 0, 7] : Success -> 0 Coalesce list: (17)_17 & (18)vect_vec_iv_.8_18 [map: 5, 6] : Success -> 5 Coalesce list: (1)vect_vec_iv_.7_1 & (10)vect_vec_iv_.7_10 [map: 0, 4] : Fail due to conflict Coalesce list: (2)vect_vec_iv_.8_2 & (18)_17 [map: 1, 5] : Fail due to conflict Coalesce list: (4)ivtmp.15_4 & (21)ivtmp.15_21 [map: 2, 8] : Success -> 2 ;; basic block 2, loop depth 0 ;; pred: ENTRY ivtmp.15_21 = (unsigned long) pBuffer_5(D); _12 = ivtmp.15_21 + 8192; ;; succ: 3 ;; basic block 3, loop depth 1 ;; pred: 2 ;; 3 # vect_vec_iv_.7_1 = PHI <{ 0, 0 }(2), _19(3)> # vect_vec_iv_.8_18 = PHI <{ 0, 0 }(2), _17(3)> # ivtmp.15_4 = PHI <ivtmp.15_21(2), ivtmp.15_22(3)> vect_vec_iv_.7_10 = vect_vec_iv_.7_1; _19 = vect_vec_iv_.7_1 + { 16843009, 16843009 }; vect_vec_iv_.8_2 = vect_vec_iv_.8_18; _17 = vect_vec_iv_.8_18 + { 16843009, 16843009 }; _20 = (void *) ivtmp.15_4; MEM[base: _20, offset: 0B] = vect_vec_iv_.7_10; MEM[base: _20, offset: 16B] = vect_vec_iv_.8_2; ivtmp.15_22 = ivtmp.15_4 + 32; if (_12 != ivtmp.15_22) goto <bb 3>; [99.00%] else goto <bb 4>; [1.00%] ;; succ: 3 ;; 4 ah, so coalescing is hindered by "scheduling" here. -fschedule-insns gets rid of one of the copies. Not vectorized code looks like .L2: movq %rax, (%rdi) addq $32, %rdi movq %rax, -24(%rdi) movq %rax, -16(%rdi) movq %rax, -8(%rdi) addq $16843009, %rax cmpq %rdx, %rax jne .L2 btw. which is likely slower (so the testcase itself is easy to fix).