Victor Do Nascimento <victor.donascime...@arm.com> writes: > This patch finalizes adding support for the generation of SVE simd clones when > no simdlen is provided, following the ABI rules where the widest data type > determines the minimum amount of elements in a length agnostic vector. > > gcc/ChangeLog: > > * config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare. > * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make > visibility global and support use for non_acle types. > * config/aarch64/aarch64.cc > (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone > when no simdlen is provided, according to ABI rules. > (simd_clone_adjust_sve_vector_type): New helper function. > (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones > and modify types to use SVE types. > * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen. > (simd_clone_adjust): Adapt safelen check to be compatible with VLA > simdlen. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan. > * gcc.target/aarch64/vect-simd-clone-1.c: New test.
As a general comment, there are a few instances of "a SVE" instead of "an SVE". Otherwise it mostly looks good, but some comments below: > --- > gcc/config/aarch64/aarch64-protos.h | 2 + > gcc/config/aarch64/aarch64-sve-builtins.cc | 6 +- > gcc/config/aarch64/aarch64.cc | 181 +++++++++++++++--- > gcc/omp-simd-clone.cc | 13 +- > .../gcc.target/aarch64/declare-simd-2.c | 1 + > .../gcc.target/aarch64/vect-simd-clone-1.c | 137 +++++++++++++ > 6 files changed, 306 insertions(+), 34 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c > > [...] > static void > aarch64_simd_clone_adjust (struct cgraph_node *node) > { > - /* Add aarch64_vector_pcs target attribute to SIMD clones so they > - use the correct ABI. */ > - > tree t = TREE_TYPE (node->decl); > - TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default", > - TYPE_ATTRIBUTES (t)); > + cl_target_option cur_target; > + bool m_old_have_regs_of_mode[MAX_MACHINE_MODE]; > + > + if (node->simdclone->vecsize_mangle == 's') > + { > + /* This is additive and has no effect if SVE, or a superset thereof, is > + already enabled. */ > + tree target = build_string (strlen ("+sve") + 1, "+sve"); > + if (!aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, > 0)) > + gcc_unreachable (); > + cl_target_option_save (&cur_target, &global_options, > &global_options_set); > + tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl); > + cl_target_option_restore (&global_options, &global_options_set, > + TREE_TARGET_OPTION (new_target)); > + aarch64_override_options_internal (&global_options); > + memcpy (m_old_have_regs_of_mode, have_regs_of_mode, > + sizeof (have_regs_of_mode)); > + for (int i = 0; i < NUM_MACHINE_MODES; ++i) > + if (aarch64_sve_mode_p ((machine_mode) i)) > + have_regs_of_mode[i] = true; I've now pushed g:9d14f677a0da80bc6355955469c69709b1d3c67e to add a push_function_decl/pop_function_decl pair that should be usable here. > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c > b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c > new file mode 100644 > index 00000000000..b33541cc329 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c > @@ -0,0 +1,137 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=c99" } */ > +/* { dg-additional-options "-O3 -march=armv8-a" } */ We should add -fno-schedule-insns and -fno-schedule-insns2, to make the order more predictable. > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +/* Ensure correct creation of SVE Vector-length agnostic (VLA SVE) vector > + function calls from scalar versions in accordance with the Vector > Function > + Application Binary Interface Specification for AArch64 (AAVPCS). > + > + We check for correctness in: > + - Vector function name mangling, with the grammar: > + > + vector name := prefix "_" name > + prefix := "_ZGV" isa mask <len> <parameters> > + > + Whereby: > + - <isa> := "s" for SVE > + - <mask> := "M" for Mask > + - <len> := "x" for VLA SVE > + > + resulting in: > + <prefix> := "_ZGVsMx" <parameters> > + > + with each vector parameter contributing a "v" to the prefix. > + > + - Parameter and return value mapping: > + - Unless marked with uniform or linear OpenMP clauses, parameters and > + return values are expected to map to vectors. > + - Where the lane-size of a parameter is less than the widest data size > + for a given function, the resulting vector should be unpacked and > + populated via use extending loads. > + > + - Finally, we also make sure we can correctly generate calls to the same > + function, differing only in the target architecture (i.e. SVE vs SIMD), > + ensuring that each call points to the correctly-mangled vector function > + and employs the correct ABI. For example, for `fn' we may expect: > + > + for #pragma GCC target("+sve"): _ZGVsMxvv_fn > + for #pragma GCC target("+simd): _ZGVnN4vv_fn */ > + > +#pragma GCC target ("+sve") > +/* scan-assembler {_ZGVsMxv_fn0} } } */ Should be: /* { dg-final { scan-assembler {...} } } */ But I think it'd be more robust to add ":\n" at the end, to make sure that there's a definition rather than a reference. It would also be good to have a C++ test for the C++ names. > +extern int __attribute__ ((simd, const)) fn0 (int); > +void test_fn0 (int *a, int *b, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] += fn0 (b[i]); > +} > + > +/* > +** test_fn1: > +** ... > +** ld1w z1\.s, p7/z, \[x22, x19, lsl 2\] > +** ld1h z0\.s, p7/z, \[x23, x19, lsl 1\] > +** mov p0\.b, p7\.b > +** bl _ZGVsMxvv_fn1 > +** st1w z0\.s, p7, \[x21, x19, lsl 2\] We shouldn't hard-code any register numbers other than the arguments and return value (z0, p0, z0, z1). Same for later tests. > +** ... > +*/ > +extern int __attribute__ ((simd, const)) fn1 (short, int); > +void test_fn1 (int *a, int *b, short *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = fn1 (c[i], b[i]); > +} > + > +/* > +** test_fn2: > +** ... > +** ld1w z1\.s, p7/z, \[x23, x19, lsl 2\] > +** ld1h z0\.s, p7/z, \[x22, x19, lsl 1\] > +** mov p0\.b, p7.b > +** bl _ZGVsMxvv_fn2 > +** st1h z0\.s, p7, \[x21, x19, lsl 1\] > +** ... > +*/ > +extern short __attribute__ ((simd, const)) fn2 (short, int); > +void test_fn2 (short *a, int *b, short *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = fn2 (c[i], b[i]); > +} > + > +/* > +** test_fn3: > +** ... > +** ld1b z23\.s, p7/z, \[x22, x19\] > +** ld1w z0\.s, p7/z, \[x23, x19, lsl 2\] > +** ptrue p6\.b, all > +** mov p0\.b, p7\.b > +** mov z1\.d, z23\.d > +** uxtb z23\.h, p6/m, z23\.h > +** bl _ZGVsMxvv_fn3 > +** uxtb z0\.h, p6/m, z0\.h I don't think we should match the first uxtb (we can just stub it out with "..."). And I think it's enough to stop matching at the first use of the return value (the uxtb z0 in this case). Same for later tests. Thanks, Richard > +** add z0\.h, z0\.h, z23\.h > +** uxth z0\.s, p6/m, z0\.s > +** st1w z0\.s, p7, \[x20, x19, lsl 2\] > +** ... > +*/ > +extern char __attribute__ ((simd, const)) fn3 (int, char); > +void test_fn3 (int *a, int *b, char *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); > +} > + > +/* > +** test_fn4: > +** ... > +** ld1h z23\.s, p7/z, \[x23, x19, lsl 1\] > +** ld1w z0\.s, p7/z, \[x22, x19, lsl 2\] > +** mov p0\.b, p7\.b > +** mov z1\.d, z23\.d > +** ptrue p6\.b, all > +** bl _ZGVsMxvv_fn4 > +** sxth z23\.s, p6/m, z23\.s > +** sxth z0\.s, p6/m, z0\.s > +** add z0\.s, z0\.s, z23\.s > +** st1w z0\.s, p7, \[x21, x19, lsl 2\] > +** ... > +*/ > +extern short __attribute__ ((simd, const)) fn4 (int, short); > +void test_fn4 (int *a, int *b, short *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = (int) (fn4 (b[i], c[i]) + c[i]); > +} > + > +#pragma GCC reset_options > +#pragma GCC target ("+simd") > +/* scan-assembler {_ZGVnN4vv_fn4} } } */ > +extern short __attribute__ ((simd, const)) fn4 (int, short); > +void test_fn5 (int *a, int *b, short *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = (int) (fn4 (b[i], c[i]) + c[i]); > +}