https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116

--- Comment #16 from Andrew Senkevich <andrew.n.senkevich at gmail dot com> ---
(In reply to amker from comment #13)
> We should create another PR for additional copy instructions after my patch
> and close this one.  IMHO they are two different issues.

I agree, currently there are no fills from stack on both testcases for which
this PR was created.
But I have no bugzilla permissions to close it, could somebody from CC close it
please?

(In reply to Pat Haugen from comment #14)
. . . 
> Additional info, it's really just one copy introduced, but becomes 4 after
> unrolling. This is the loop from the first testcase without -funroll-loops.
> Looks like we could get rid of the vmovaps by making zmm2 the dest on the
> vpermps (assuming I'm understanding the asm correctly).
> 
> .L26:
>         vpermps (%rcx), %zmm10, %zmm1
>         leal    1(%rsi), %esi
>         vmovaps %zmm1, %zmm2
>         vmaxps  (%r15,%rdx), %zmm3, %zmm1
>         vfnmadd132ps    (%r12,%rdx), %zmm7, %zmm2
>         cmpl    %esi, %r8d
>         leaq    -64(%rcx), %rcx
>         vmaxps  %zmm1, %zmm2, %zmm1
>         vmovups %zmm1, (%rdi,%rdx)
>         leaq    64(%rdx), %rdx
>         ja      .L26

Looks like so. For which optimization/analysis we should file ticket for it?

Reply via email to