https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96339
Bug ID: 96339 Summary: [SVE] Optimise svlast[ab] Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- For: #include <arm_sve.h> int8_t foo (svint8_t x) { return svlasta (svptrue_pat_b8 (SV_VL1), x); } we generate: ptrue p0.b, vl1 lasta w0, p0, z0.b ret But this is really a BIT_FIELD_REF with a constant index, so we can generate: umov w0, v0.b[0] ret (works even on big-endian targets). We should make svlast_impl fold calls to a BIT_FIELD_REF if: (1) the predicate input is a constant, (2) we know at compile time which element the constant selects, and (3) the selected element ends on or before byte 64 For other cases it is probably better to keep the original call. (2) excludes variable-length predicate constants if (a) we can't determine the index of the final 1 bit at compile time or (b) that index might be beyond the end of the vector (3) restricts the optimisation to cases that can be handled by *vec_extract<mode><Vel>_v128 and *vec_extract<mode><Vel>_dup.