https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96339

            Bug ID: 96339
           Summary: [SVE] Optimise svlast[ab]
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

For:

#include <arm_sve.h>
int8_t
foo (svint8_t x)
{
  return svlasta (svptrue_pat_b8 (SV_VL1), x);
}

we generate:

        ptrue   p0.b, vl1
        lasta   w0, p0, z0.b
        ret

But this is really a BIT_FIELD_REF with a constant index,
so we can generate:

        umov    w0, v0.b[0]
        ret

(works even on big-endian targets).

We should make svlast_impl fold calls to a BIT_FIELD_REF if:
(1) the predicate input is a constant,
(2) we know at compile time which element the constant selects, and
(3) the selected element ends on or before byte 64

For other cases it is probably better to keep the original call.

(2) excludes variable-length predicate constants if (a) we can't
determine the index of the final 1 bit at compile time or
(b) that index might be beyond the end of the vector

(3) restricts the optimisation to cases that can be handled by
*vec_extract<mode><Vel>_v128 and *vec_extract<mode><Vel>_dup.

Reply via email to