https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104912

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm noting that for skylake cost we have

  _28 * _33 1 times scalar_stmt costs 16 in prologue

and

  _28 * _33 1 times vector_stmt costs 16 in body

but the load/store costs are just 12, compared to znver2 this tips the bias
over to allow vectorization while for znver2 I currently see no vectorization.
For generic I also see vectorization.

Note that costing currently assumes that the cost model niter check is
performed first and short-cuts all the versioning conditions.  But since
we emit

  _248 = (unsigned int) mk_113;
  _247 = _248 + 4294967295;
  _246 = _247 > 2;
  _245 = stride.4_74 != 0;
  _244 = _245 & _246;
...
  _183 = _184 | _211;
  _182 = _183 & _244;
  if (_182 != 0)
    goto <bb 27>; [80.00%]
  else
    goto <bb 28>; [20.00%]

on GIMPLE how things are expanded depends on some luck and with the standalone
testcase and -Ofast with generic tuning we emit the > 2 cost model check
quite late:

        addq    $1, %rdi
        imulq   %r13, %rdi
        leaq    (%rax,%rdi), %rcx
        movq    32(%rsp), %rax
        leaq    (%rax,%rcx), %rsi
        movq    (%rsp), %rax
        leaq    0(,%rsi,8), %rdx
        addq    %rax, %rcx
        leaq    0(,%rcx,8), %rax
        addq    %r13, %rcx
        salq    $3, %rcx
        cmpq    %rcx, %rdx
        setg    %cl
        addq    %r13, %rsi
        salq    $3, %rsi
        cmpq    %rsi, %rax
        setg    %sil
        orb     %cl, %sil
        je      .L8
        movl    -100(%rsp), %esi
        leal    -1(%rsi), %ecx
        cmpl    $2, %ecx             <-----
        movl    112(%rsp), %ecx
        seta    %sil
        testl   %ecx, %ecx
        setg    %cl
        testb   %cl, %sil
        je      .L8

let me try to hack^Wfix this.

Reply via email to