https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117270

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=116583
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot 
gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-11-29
                 CC|                            |tnfchris at gcc dot gnu.org

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Confirmed,

The hot loop does this weird codegen:

ldp q27, q30, [x9]
add x15, sp, #0xa0
ldp q31, q29, [x9, #32]
stp q27, q30, [sp, #160]
stp q30, q31, [sp, #192]
stp q31, q29, [sp, #224]
ld1 {v30.16b, v31.16b}, [x15]
add x15, sp, #0xc0
tbl v29.16b, {v30.16b, v31.16b}, v2.16b
ld1 {v30.16b, v31.16b}, [x15]
add x15, sp, #0xe0
ld1 {v26.16b, v27.16b}, [x15]
tbl v30.16b, {v30.16b, v31.16b}, v3.16b
tbl v31.16b, {v26.16b, v27.16b}, v5.16b

i.e. it uses the stack to permute the two input registers, and then re-permutes
them using a permute instruction.

There are two things wrong here:

1. It shouldn't have gone through the stack for this.
2. The two permutes should have been folder by the vectorizer.

This seems caused by one of the VLA SLP permute commits from Richard S starting
with

commit 8157f3f2d211bfbf53fbf8dd209b47ce583f4142
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Mon Oct 7 13:03:04 2024 +0100

    vect: Support more VLA SLP permutations [PR116583]


Mine.

Reply via email to