https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055
Bug ID: 102055
Summary: full 128byte swap using __builtin_shuffle should
produce rev64 followed by ext
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64-*-*
Take:
#define vector __attribute__((vector_size(16)))
vector char g(vector char a)
{
return __builtin_shuffle(a,(vector
char){15,14,13,12,11,10,9,8,7,6,5,4,3,2,1, 0});
}
vector char g1(vector char a)
{
vector char t= __builtin_shuffle(a,(vector
char){7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8,});
vector long long t1 = (vector long long)t;
t1 = __builtin_shuffle(t1, (vector long long){1,0});
return (vector char)t1;
}
The first case uses ldr/tlb but really it can be done in two steps as rev64
followed by ext.
rev64 v0.16b, v0.16b
ext v0.16b, v0.16b, v0.16b, #8