https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Target Milestone|---                         |12.0
   Last reconfirmed|                            |2021-09-24
            Summary|521.wrf_r 5% slower at      |[12 Regression] 521.wrf_r
                   |-Ofast and generic x86_64   |5% slower at -Ofast and
                   |tuning after                |generic x86_64 tuning after
                   |r12-3426-g8f323c712ea76c    |r12-3426-g8f323c712ea76c
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Looks like at least on Zen movs[hl]dup is on the integer domain so we'l see a
domain crossing penalty here(?).  But since this is a generic arch/tuning
regression the SSE2 code path should be what matters - on the committed
testcase I see

foo:
.LFB572:
        .cfi_startproc
        pxor    %xmm0, %xmm0
        addss   (%rdi), %xmm0
        addss   4(%rdi), %xmm0
        addss   8(%rdi), %xmm0
        addss   12(%rdi), %xmm0
        ret

where it seems that the vectorizer doesn't pick up the reduction pattern.

/home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21:
note:   vect_is_simple_use: vectype vector(4) float
/home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21:
missed:   reduc op not supported by target.

Reply via email to