https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=53346,
| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=93720
Component|tree-optimization |target
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this was a target issue and maybe should be split into a couple
different bugs.
For GCC 8, aarch64 produces:
dup v0.4s, v0.s[1]
ldr q1, [sp, 16]
ldp x29, x30, [sp], 32
ins v0.s[1], v1.s[1]
ins v0.s[2], v1.s[2]
ins v0.s[3], v1.s[3]
For GCC 9/10 did (which is ok, though could be improved which it did in GCC
11):
adrp x0, .LC0
ldr q1, [sp, 16]
ldr q2, [x0, #:lo12:.LC0]
ldp x29, x30, [sp], 32
tbl v0.16b, {v0.16b - v1.16b}, v2.16b
For GCC 11+, aarch64 produces:
ldr q1, [sp, 16]
ins v1.s[0], v0.s[1]
mov v0.16b, v1.16b
Which means for aarch64, this was changed in GCC 10 and fixed fully for GCC 11
(by r11-2192-gc9c87e6f9c795b aka PR 93720 which was my patch in fact).
For x86_64, the trunk produces:
movaps (%rsp), %xmm1
addq $24, %rsp
shufps $85, %xmm1, %xmm0
shufps $232, %xmm1, %xmm0
While for GCC 12 produces:
movaps (%rsp), %xmm1
addq $24, %rsp
shufps $85, %xmm0, %xmm0
movaps %xmm1, %xmm2
shufps $85, %xmm1, %xmm2
movaps %xmm2, %xmm3
movaps %xmm1, %xmm2
unpckhps %xmm1, %xmm2
unpcklps %xmm3, %xmm0
shufps $255, %xmm1, %xmm1
unpcklps %xmm1, %xmm2
movlhps %xmm2, %xmm0
This was changed with r13-2843-g3db8e9c2422d92 (aka PR 53346).
For powerpc64le, it looks ok for GCC 11:
addis 9,2,.LC0@toc@ha
addi 1,1,48
addi 9,9,.LC0@toc@l
li 0,-16
lvx 0,0,9
vperm 2,31,2,0
Both the x86_64 and the PowerPC PERM implementation could be improved to
support the inseration like the aarch64 backend does too.