https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117270
Tamar Christina <tnfchris at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED See Also| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=116583 Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed| |2024-11-29 CC| |tnfchris at gcc dot gnu.org --- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> --- Confirmed, The hot loop does this weird codegen: ldp q27, q30, [x9] add x15, sp, #0xa0 ldp q31, q29, [x9, #32] stp q27, q30, [sp, #160] stp q30, q31, [sp, #192] stp q31, q29, [sp, #224] ld1 {v30.16b, v31.16b}, [x15] add x15, sp, #0xc0 tbl v29.16b, {v30.16b, v31.16b}, v2.16b ld1 {v30.16b, v31.16b}, [x15] add x15, sp, #0xe0 ld1 {v26.16b, v27.16b}, [x15] tbl v30.16b, {v30.16b, v31.16b}, v3.16b tbl v31.16b, {v26.16b, v27.16b}, v5.16b i.e. it uses the stack to permute the two input registers, and then re-permutes them using a permute instruction. There are two things wrong here: 1. It shouldn't have gone through the stack for this. 2. The two permutes should have been folder by the vectorizer. This seems caused by one of the VLA SLP permute commits from Richard S starting with commit 8157f3f2d211bfbf53fbf8dd209b47ce583f4142 Author: Richard Sandiford <richard.sandif...@arm.com> Date: Mon Oct 7 13:03:04 2024 +0100 vect: Support more VLA SLP permutations [PR116583] Mine.