https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342

--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to yangyang from comment #3)
>     The work is mainly composed of three parts: the generating of SVE
> functions for "omp declare simd" in pass_omp_simd_clone, the supporting of
> SVE PCS of non-builtin types, and the generating of the call of SVE
> vectoried functions in pass_vect. I plan to finish this work in the
> following five steps, each step corresponds to a patch:
This plan looks really good to me, thanks.  I agree with everything
I've snipped in this reply.

> f) In pass_expand, only when a “SVE type” attribute is added to the tree
> nodes of the types of arguments and return type, these types use the SVE
> PCS. For now, GCC only has a mechanism for adding attributes to SVE builtin
> type, so I plan to define a new hook to add attribute to the types of
> arguments and return type of simdclones generated if needed. The related
> processing functions are planned to be moved to aarch64.c from
> aarch64-sve-builtin.cc in addition.
It's a very minor detail, sorry, but I'd prefer to keep stuff in
aarch64-sve-builtins.cc if possible, and simply export the functions
that we need via aarch64-protos.h.

> Part 4) Add the generating of VLS SVE functions for "omp declare simd". The
> specification writes: “When using a simdlen(len) clause, the compiler
> expects a VLS vector version of the function that is tuned for a specific
> implementation of SVE. ”. Therefore I think only when the number of bits in
> a SVE vector register of the target is specified and coincides with the
> simdlen clause, GCC is supposed to generate the VLS SVE functions for "omp
> declare simd",
I think in principle we should generate this unconditionally.
There are two possible approaches, in increasing order of
quality of implementation:

(1) Divide the problem into three cases:

    (a) -msve-vector-bits=scalable

        In this case, generate VLA code for the VLS routines.
        The point here is that the VLS interface guarantees
        that the SVE registers are a particular size, but the
        compiler is not required to take advantage of that
        information.  Using VLA code is a valid implementation
        choice.

    (b) -msve-vectors-bits=N, N matches the simdlen

        For this we'd generate VLS code in the way that you
        describe.

    (c) -msve-vectors-bits=N, N does not match the simdlen

        We should silently accept this for declarations, but emit
        a warning or an error if the compiler needs to generate a
        definition.

(2) Allow -msve-vector-bits= to vary on a function-by-function basis,
    in the same way that the set of target features can already vary
    on a function-by-function basis.  Then, as a follow-on change,
    use this feature to generate VLS code for whichever simdlen
    the code specifies.

(2) is likely to be tricky, so I'd recommend starting with (1)
and treating (2) as a potential future optimisation.

> Part 5) Generate the call of SVE vectoried functions in pass_vect,
> specifically:
> 
> a) Define a new hook that return true if the target support variable vector
> length simdclones and set the aarch64 return value to true if TARGET_SVE. In
> vectorizable_simd_clone_call, continue analyzing instead of directly
> returning false.
It would be good to generalise existing hooks if possible, rather than
add one specifically for VLA vs. VLS.

>     In addition, I have finished the first two patches and attached them on
> this PR. Is it necessary to send the patchs to the GCC patches mailing list
> for reviewing?
Yeah, if you could send them to gcc-patches, that'd be great.

Reply via email to