https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116
--- Comment #16 from Andrew Senkevich <andrew.n.senkevich at gmail dot com> --- (In reply to amker from comment #13) > We should create another PR for additional copy instructions after my patch > and close this one. IMHO they are two different issues. I agree, currently there are no fills from stack on both testcases for which this PR was created. But I have no bugzilla permissions to close it, could somebody from CC close it please? (In reply to Pat Haugen from comment #14) . . . > Additional info, it's really just one copy introduced, but becomes 4 after > unrolling. This is the loop from the first testcase without -funroll-loops. > Looks like we could get rid of the vmovaps by making zmm2 the dest on the > vpermps (assuming I'm understanding the asm correctly). > > .L26: > vpermps (%rcx), %zmm10, %zmm1 > leal 1(%rsi), %esi > vmovaps %zmm1, %zmm2 > vmaxps (%r15,%rdx), %zmm3, %zmm1 > vfnmadd132ps (%r12,%rdx), %zmm7, %zmm2 > cmpl %esi, %r8d > leaq -64(%rcx), %rcx > vmaxps %zmm1, %zmm2, %zmm1 > vmovups %zmm1, (%rdi,%rdx) > leaq 64(%rdx), %rdx > ja .L26 Looks like so. For which optimization/analysis we should file ticket for it?