https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- (In reply to yangyang from comment #3) > The work is mainly composed of three parts: the generating of SVE > functions for "omp declare simd" in pass_omp_simd_clone, the supporting of > SVE PCS of non-builtin types, and the generating of the call of SVE > vectoried functions in pass_vect. I plan to finish this work in the > following five steps, each step corresponds to a patch: This plan looks really good to me, thanks. I agree with everything I've snipped in this reply. > f) In pass_expand, only when a “SVE type” attribute is added to the tree > nodes of the types of arguments and return type, these types use the SVE > PCS. For now, GCC only has a mechanism for adding attributes to SVE builtin > type, so I plan to define a new hook to add attribute to the types of > arguments and return type of simdclones generated if needed. The related > processing functions are planned to be moved to aarch64.c from > aarch64-sve-builtin.cc in addition. It's a very minor detail, sorry, but I'd prefer to keep stuff in aarch64-sve-builtins.cc if possible, and simply export the functions that we need via aarch64-protos.h. > Part 4) Add the generating of VLS SVE functions for "omp declare simd". The > specification writes: “When using a simdlen(len) clause, the compiler > expects a VLS vector version of the function that is tuned for a specific > implementation of SVE. ”. Therefore I think only when the number of bits in > a SVE vector register of the target is specified and coincides with the > simdlen clause, GCC is supposed to generate the VLS SVE functions for "omp > declare simd", I think in principle we should generate this unconditionally. There are two possible approaches, in increasing order of quality of implementation: (1) Divide the problem into three cases: (a) -msve-vector-bits=scalable In this case, generate VLA code for the VLS routines. The point here is that the VLS interface guarantees that the SVE registers are a particular size, but the compiler is not required to take advantage of that information. Using VLA code is a valid implementation choice. (b) -msve-vectors-bits=N, N matches the simdlen For this we'd generate VLS code in the way that you describe. (c) -msve-vectors-bits=N, N does not match the simdlen We should silently accept this for declarations, but emit a warning or an error if the compiler needs to generate a definition. (2) Allow -msve-vector-bits= to vary on a function-by-function basis, in the same way that the set of target features can already vary on a function-by-function basis. Then, as a follow-on change, use this feature to generate VLS code for whichever simdlen the code specifies. (2) is likely to be tricky, so I'd recommend starting with (1) and treating (2) as a potential future optimisation. > Part 5) Generate the call of SVE vectoried functions in pass_vect, > specifically: > > a) Define a new hook that return true if the target support variable vector > length simdclones and set the aarch64 return value to true if TARGET_SVE. In > vectorizable_simd_clone_call, continue analyzing instead of directly > returning false. It would be good to generalise existing hooks if possible, rather than add one specifically for VLA vs. VLS. > In addition, I have finished the first two patches and attached them on > this PR. Is it necessary to send the patchs to the GCC patches mailing list > for reviewing? Yeah, if you could send them to gcc-patches, that'd be great.