https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95219

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Coalesce list: (4)ivtmp.15_4 & (22)ivtmp.15_22 [map: 2, 9] : Success -> 2
Coalesce list: (1)vect_vec_iv_.7_1 & (19)_19 [map: 0, 7] : Success -> 0
Coalesce list: (17)_17 & (18)vect_vec_iv_.8_18 [map: 5, 6] : Success -> 5
Coalesce list: (1)vect_vec_iv_.7_1 & (10)vect_vec_iv_.7_10 [map: 0, 4] : Fail
due to conflict
Coalesce list: (2)vect_vec_iv_.8_2 & (18)_17 [map: 1, 5] : Fail due to conflict
Coalesce list: (4)ivtmp.15_4 & (21)ivtmp.15_21 [map: 2, 8] : Success -> 2

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  ivtmp.15_21 = (unsigned long) pBuffer_5(D);
  _12 = ivtmp.15_21 + 8192;
;;    succ:       3

;;   basic block 3, loop depth 1
;;    pred:       2
;;                3
  # vect_vec_iv_.7_1 = PHI <{ 0, 0 }(2), _19(3)>
  # vect_vec_iv_.8_18 = PHI <{ 0, 0 }(2), _17(3)>
  # ivtmp.15_4 = PHI <ivtmp.15_21(2), ivtmp.15_22(3)>
  vect_vec_iv_.7_10 = vect_vec_iv_.7_1;
  _19 = vect_vec_iv_.7_1 + { 16843009, 16843009 };
  vect_vec_iv_.8_2 = vect_vec_iv_.8_18;
  _17 = vect_vec_iv_.8_18 + { 16843009, 16843009 };
  _20 = (void *) ivtmp.15_4;
  MEM[base: _20, offset: 0B] = vect_vec_iv_.7_10;
  MEM[base: _20, offset: 16B] = vect_vec_iv_.8_2;
  ivtmp.15_22 = ivtmp.15_4 + 32;
  if (_12 != ivtmp.15_22)
    goto <bb 3>; [99.00%]
  else
    goto <bb 4>; [1.00%]
;;    succ:       3
;;                4

ah, so coalescing is hindered by "scheduling" here.  -fschedule-insns
gets rid of one of the copies.

Not vectorized code looks like

.L2:
        movq    %rax, (%rdi)
        addq    $32, %rdi
        movq    %rax, -24(%rdi)
        movq    %rax, -16(%rdi)
        movq    %rax, -8(%rdi)
        addq    $16843009, %rax
        cmpq    %rdx, %rax
        jne     .L2

btw. which is likely slower (so the testcase itself is easy to fix).

Reply via email to