https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121294
Bug ID: 121294
Summary: Incorrect optimisation of b16/32/64 forms of SVE
permute intrinsics
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: aarch64-sve, wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rsandifo at gcc dot gnu.org
Target Milestone: ---
Target: aarch64*-*-*
#include <arm_sve.h>
svbool_t
foo ()
{
return svtrn1_b16 (svptrue_b8 (), svptrue_b16 ());
}
compiled with -O2 -march=armv8.2-a+sve gives:
foo:
ptrue p0.b, all
trn1 p0.h, p0.h, p0.h
ret
which is equivalent to:
foo:
ptrue p0.b, all
ret
The svptrue_b16() has effectively been replaced by svptrue_b8().
This happens because the input and output of the underlying define_insn have
VNx8BImode, meaning that every odd-indexed bit of the predicate is
insignificant. That's ok/correct when permuting predicates created during
autovectorisation, but it isn't correct for ACLE code, where every bit of an
svbool_t is significant.