https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82199
Bug ID: 82199
Summary: __builtin_shuffle sometimes should produce ins rather
than TBL
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
Take:
#define vector __attribute__((vector_size(16) ))
vector float f(vector float a, vector float b)
{
return __builtin_shuffle (a, b, (vector int){0, 1, 4,5});
}
---- CUT ---
Currently this produces TBL but really we should be able to produce (for
little-endian):
f:
ins v0.2d[1], v1.2d[0]
ret
--- CUT ---
X86_64 is able to produce:
f:
movlhps %xmm1, %xmm0
ret
Which is what I had expected.
There is most likely many more __builtin_shuffle which can be optimized for
aarch64 without using TBL which we are not currently doing.