Hi All, The new patch contains the proper types for the intrinsics that should be returning uint64x1 and has the rest of the comments by Christophe in them.
Kind Regards, Tamar ________________________________________ From: Tamar Christina Sent: Friday, November 25, 2016 4:01:30 PM To: Christophe Lyon Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC > > > A few comments about this new version: > > * arm-neon-ref.h: why do you create > CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64? > > Can't you just add calls to CHECK_CRYPTO in the existing > > CHECK_RESULTS_NAMED_NO_FP16? Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when I added it I didn't remove the split. I'll do it now. > > > > * p64_p128: > > From what I can see ARM and AArch64 differ on the vceq variants > > available with poly64. > > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a, > > poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ... > > Actually I've just noticed the other you submitted while I was writing > > this, where you add vceq_p64 for aarch64, but it still returns > > uint64_t. > > Why do you change the vceq_64 test to return poly64_t instead of > uint64_t? This patch is slightly outdated. The correct type is `uint64_t` but when it was noticed This patch was already sent. New one coming soon. > > > > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64? > > This is wrong, remove them. It was supposed to be around the vldX_lane_p64 tests. > > The comment /* vget_lane_p64 tests. */ is wrong before VLDX_LANE > > tests > > > > You need to protect the new vmov, vget_high and vget_lane tests with > > #ifdef __aarch64__. > > vget_lane is already in an #ifdef, vmov you're right, but I also notice that the test calls VDUP instead of VMOV, which explains why I didn't get a test failure. Thanks for the feedback, I'll get these updated. > > Actually, vget_high_p64 exists on arm, so no need for the #fidef for it. > > > > Christophe > > > >> Kind regards, > >> Tamar > >> ________________________________________ > >> From: Tamar Christina > >> Sent: Tuesday, November 8, 2016 11:58:46 AM > >> To: Christophe Lyon > >> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard > >> Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd > >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing > >> Poly64_t intrinsics to GCC > >> > >> Hi Christophe, > >> > >> Thanks for the review! > >> > >>> > >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests > >>> except for vreinterpret. > >>> Why do you need to create p64.c ? > >> > >> I originally created it because I had a much smaller set of > >> intrinsics that I wanted to add initially, this grew and It hadn't > >> occurred to > me that I can use the existing file now. > >> > >> Another reason was the effective-target arm_crypto_ok as you > mentioned below. > >> > >>> > >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or > >>> p64_p128.c might be easier to maintain than adding them to vcreate.c > >>> etc with several #ifdef conditions. > >> > >> Fair enough, I'll move them to p64_p128.c. > >> > >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)" > >>> condition? These intrinsics are defined in arm/arm_neon.h, right? > >>> They are tested in p64_p128.c > >> > >> I should have looked for them, they weren't being tested before so I > >> had Mistakenly assumed that they weren't available. Now I realize I > >> just need To add the proper test option to the file to enable crypto. I'll > update this as well. > >> > >>> Looking at your patch, it seems some tests are currently missing for arm: > >>> vget_high_p64. I'm not sure why I missed it when I removed neont- > >>> testgen... > >> > >> I'll adjust the test conditions so they run for ARM as well. > >> > >>> > >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target > >>> arm_crypto_ok prevent the tests from running on aarch64? > >> > >> Yes they do, I was comparing the output against a clean version and > >> hasn't noticed That they weren't running. Thanks! > >> > >>> > >>> Thanks, > >>> > >>> Christophe
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h index 462141586b3db7c5256c74b08fa0449210634226..beaf6ac31d5c5affe3702a505ad0df8679229e32 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h @@ -32,6 +32,13 @@ extern size_t strlen(const char *); VECT_VAR(expected, int, 16, 4) -> expected_int16x4 VECT_VAR_DECL(expected, int, 16, 4) -> int16x4_t expected_int16x4 */ +/* Some instructions don't exist on ARM. + Use this macro to guard against them. */ +#ifdef __aarch64__ +#define AARCH64_ONLY(X) X +#else +#define AARCH64_ONLY(X) +#endif #define xSTR(X) #X #define STR(X) xSTR(X) @@ -92,6 +99,13 @@ extern size_t strlen(const char *); fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG); \ } +#if defined (__ARM_FEATURE_CRYPTO) +#define CHECK_CRYPTO(MSG,T,W,N,FMT,EXPECTED,COMMENT) \ + CHECK(MSG,T,W,N,FMT,EXPECTED,COMMENT) +#else +#define CHECK_CRYPTO(MSG,T,W,N,FMT,EXPECTED,COMMENT) +#endif + /* Floating-point variant. */ #define CHECK_FP(MSG,T,W,N,FMT,EXPECTED,COMMENT) \ { \ @@ -184,6 +198,9 @@ extern ARRAY(expected, uint, 32, 2); extern ARRAY(expected, uint, 64, 1); extern ARRAY(expected, poly, 8, 8); extern ARRAY(expected, poly, 16, 4); +#if defined (__ARM_FEATURE_CRYPTO) +extern ARRAY(expected, poly, 64, 1); +#endif extern ARRAY(expected, hfloat, 16, 4); extern ARRAY(expected, hfloat, 32, 2); extern ARRAY(expected, hfloat, 64, 1); @@ -197,6 +214,9 @@ extern ARRAY(expected, uint, 32, 4); extern ARRAY(expected, uint, 64, 2); extern ARRAY(expected, poly, 8, 16); extern ARRAY(expected, poly, 16, 8); +#if defined (__ARM_FEATURE_CRYPTO) +extern ARRAY(expected, poly, 64, 2); +#endif extern ARRAY(expected, hfloat, 16, 8); extern ARRAY(expected, hfloat, 32, 4); extern ARRAY(expected, hfloat, 64, 2); @@ -213,6 +233,7 @@ extern ARRAY(expected, hfloat, 64, 2); CHECK(test_name, uint, 64, 1, PRIx64, EXPECTED, comment); \ CHECK(test_name, poly, 8, 8, PRIx8, EXPECTED, comment); \ CHECK(test_name, poly, 16, 4, PRIx16, EXPECTED, comment); \ + CHECK_CRYPTO(test_name, poly, 64, 1, PRIx64, EXPECTED, comment); \ CHECK_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment); \ \ CHECK(test_name, int, 8, 16, PRIx8, EXPECTED, comment); \ @@ -225,6 +246,7 @@ extern ARRAY(expected, hfloat, 64, 2); CHECK(test_name, uint, 64, 2, PRIx64, EXPECTED, comment); \ CHECK(test_name, poly, 8, 16, PRIx8, EXPECTED, comment); \ CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment); \ + CHECK_CRYPTO(test_name, poly, 64, 2, PRIx64, EXPECTED, comment); \ CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment); \ } \ @@ -398,6 +420,9 @@ static void clean_results (void) CLEAN(result, uint, 64, 1); CLEAN(result, poly, 8, 8); CLEAN(result, poly, 16, 4); +#if defined (__ARM_FEATURE_CRYPTO) + CLEAN(result, poly, 64, 1); +#endif #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) CLEAN(result, float, 16, 4); #endif @@ -413,6 +438,9 @@ static void clean_results (void) CLEAN(result, uint, 64, 2); CLEAN(result, poly, 8, 16); CLEAN(result, poly, 16, 8); +#if defined (__ARM_FEATURE_CRYPTO) + CLEAN(result, poly, 64, 2); +#endif #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) CLEAN(result, float, 16, 8); #endif @@ -438,6 +466,13 @@ static void clean_results (void) #define DECL_VARIABLE(VAR, T1, W, N) \ VECT_TYPE(T1, W, N) VECT_VAR(VAR, T1, W, N) +#if defined (__ARM_FEATURE_CRYPTO) +#define DECL_VARIABLE_CRYPTO(VAR, T1, W, N) \ + DECL_VARIABLE(VAR, T1, W, N) +#else +#define DECL_VARIABLE_CRYPTO(VAR, T1, W, N) +#endif + /* Declare only 64 bits signed variants. */ #define DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR) \ DECL_VARIABLE(VAR, int, 8, 8); \ @@ -473,6 +508,7 @@ static void clean_results (void) DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR); \ DECL_VARIABLE(VAR, poly, 8, 8); \ DECL_VARIABLE(VAR, poly, 16, 4); \ + DECL_VARIABLE_CRYPTO(VAR, poly, 64, 1); \ DECL_VARIABLE(VAR, float, 16, 4); \ DECL_VARIABLE(VAR, float, 32, 2) #else @@ -481,6 +517,7 @@ static void clean_results (void) DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR); \ DECL_VARIABLE(VAR, poly, 8, 8); \ DECL_VARIABLE(VAR, poly, 16, 4); \ + DECL_VARIABLE_CRYPTO(VAR, poly, 64, 1); \ DECL_VARIABLE(VAR, float, 32, 2) #endif @@ -491,6 +528,7 @@ static void clean_results (void) DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR); \ DECL_VARIABLE(VAR, poly, 8, 16); \ DECL_VARIABLE(VAR, poly, 16, 8); \ + DECL_VARIABLE_CRYPTO(VAR, poly, 64, 2); \ DECL_VARIABLE(VAR, float, 16, 8); \ DECL_VARIABLE(VAR, float, 32, 4) #else @@ -499,6 +537,7 @@ static void clean_results (void) DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR); \ DECL_VARIABLE(VAR, poly, 8, 16); \ DECL_VARIABLE(VAR, poly, 16, 8); \ + DECL_VARIABLE_CRYPTO(VAR, poly, 64, 2); \ DECL_VARIABLE(VAR, float, 32, 4) #endif /* Declare all variants. */ @@ -531,6 +570,13 @@ static void clean_results (void) /* Helpers to call macros with 1 constant and 5 variable arguments. */ +#if defined (__ARM_FEATURE_CRYPTO) +#define MACRO_CRYPTO(MACRO, VAR1, VAR2, T1, T2, T3, W, N) \ + MACRO(VAR1, VAR2, T1, T2, T3, W, N) +#else +#define MACRO_CRYPTO(MACRO, VAR1, VAR2, T1, T2, T3, W, N) +#endif + #define TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR) \ MACRO(VAR, , int, s, 8, 8); \ MACRO(VAR, , int, s, 16, 4); \ @@ -601,13 +647,15 @@ static void clean_results (void) TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2); \ TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2); \ MACRO(VAR1, VAR2, , poly, p, 8, 8); \ - MACRO(VAR1, VAR2, , poly, p, 16, 4) + MACRO(VAR1, VAR2, , poly, p, 16, 4); \ + MACRO_CRYPTO(MACRO, VAR1, VAR2, , poly, p, 64, 1) #define TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2) \ TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2); \ TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2); \ MACRO(VAR1, VAR2, q, poly, p, 8, 16); \ - MACRO(VAR1, VAR2, q, poly, p, 16, 8) + MACRO(VAR1, VAR2, q, poly, p, 16, 8); \ + MACRO_CRYPTO(MACRO, VAR1, VAR2, q, poly, p, 64, 2) #define TEST_MACRO_ALL_VARIANTS_2_5(MACRO, VAR1, VAR2) \ TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2); \ diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c index 519cffb0125079022e7ba876c1ca657d9e37cac2..8907b38cde90b44a8f1501f72b2c4e812cba5707 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c @@ -1,8 +1,9 @@ /* This file contains tests for all the *p64 intrinsics, except for vreinterpret which have their own testcase. */ -/* { dg-require-effective-target arm_crypto_ok } */ +/* { dg-require-effective-target arm_crypto_ok { target { arm*-*-* } } } */ /* { dg-add-options arm_crypto } */ +/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } }*/ #include <arm_neon.h> #include "arm-neon-ref.h" @@ -38,6 +39,17 @@ VECT_VAR_DECL(vdup_n_expected2,poly,64,1) [] = { 0xfffffffffffffff2 }; VECT_VAR_DECL(vdup_n_expected2,poly,64,2) [] = { 0xfffffffffffffff2, 0xfffffffffffffff2 }; +/* Expected results: vmov_n. */ +VECT_VAR_DECL(vmov_n_expected0,poly,64,1) [] = { 0xfffffffffffffff0 }; +VECT_VAR_DECL(vmov_n_expected0,poly,64,2) [] = { 0xfffffffffffffff0, + 0xfffffffffffffff0 }; +VECT_VAR_DECL(vmov_n_expected1,poly,64,1) [] = { 0xfffffffffffffff1 }; +VECT_VAR_DECL(vmov_n_expected1,poly,64,2) [] = { 0xfffffffffffffff1, + 0xfffffffffffffff1 }; +VECT_VAR_DECL(vmov_n_expected2,poly,64,1) [] = { 0xfffffffffffffff2 }; +VECT_VAR_DECL(vmov_n_expected2,poly,64,2) [] = { 0xfffffffffffffff2, + 0xfffffffffffffff2 }; + /* Expected results: vext. */ VECT_VAR_DECL(vext_expected,poly,64,1) [] = { 0xfffffffffffffff0 }; VECT_VAR_DECL(vext_expected,poly,64,2) [] = { 0xfffffffffffffff1, 0x88 }; @@ -45,6 +57,9 @@ VECT_VAR_DECL(vext_expected,poly,64,2) [] = { 0xfffffffffffffff1, 0x88 }; /* Expected results: vget_low. */ VECT_VAR_DECL(vget_low_expected,poly,64,1) [] = { 0xfffffffffffffff0 }; +/* Expected results: vget_high. */ +VECT_VAR_DECL(vget_high_expected,poly,64,1) [] = { 0xfffffffffffffff1 }; + /* Expected results: vld1. */ VECT_VAR_DECL(vld1_expected,poly,64,1) [] = { 0xfffffffffffffff0 }; VECT_VAR_DECL(vld1_expected,poly,64,2) [] = { 0xfffffffffffffff0, @@ -109,6 +124,39 @@ VECT_VAR_DECL(vst1_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 }; VECT_VAR_DECL(vst1_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0, 0x3333333333333333 }; +/* Expected results: vldX_lane. */ +VECT_VAR_DECL(expected_vld_st2_0,poly,64,1) [] = { 0xfffffffffffffff0 }; +VECT_VAR_DECL(expected_vld_st2_0,poly,64,2) [] = { 0xfffffffffffffff0, + 0xfffffffffffffff1 }; +VECT_VAR_DECL(expected_vld_st2_1,poly,64,1) [] = { 0xfffffffffffffff1 }; +VECT_VAR_DECL(expected_vld_st2_1,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa, + 0xaaaaaaaaaaaaaaaa }; +VECT_VAR_DECL(expected_vld_st3_0,poly,64,1) [] = { 0xfffffffffffffff0 }; +VECT_VAR_DECL(expected_vld_st3_0,poly,64,2) [] = { 0xfffffffffffffff0, + 0xfffffffffffffff1 }; +VECT_VAR_DECL(expected_vld_st3_1,poly,64,1) [] = { 0xfffffffffffffff1 }; +VECT_VAR_DECL(expected_vld_st3_1,poly,64,2) [] = { 0xfffffffffffffff2, + 0xaaaaaaaaaaaaaaaa }; +VECT_VAR_DECL(expected_vld_st3_2,poly,64,1) [] = { 0xfffffffffffffff2 }; +VECT_VAR_DECL(expected_vld_st3_2,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa, + 0xaaaaaaaaaaaaaaaa }; +VECT_VAR_DECL(expected_vld_st4_0,poly,64,1) [] = { 0xfffffffffffffff0 }; +VECT_VAR_DECL(expected_vld_st4_0,poly,64,2) [] = { 0xfffffffffffffff0, + 0xfffffffffffffff1 }; +VECT_VAR_DECL(expected_vld_st4_1,poly,64,1) [] = { 0xfffffffffffffff1 }; +VECT_VAR_DECL(expected_vld_st4_1,poly,64,2) [] = { 0xfffffffffffffff2, + 0xfffffffffffffff3 }; +VECT_VAR_DECL(expected_vld_st4_2,poly,64,1) [] = { 0xfffffffffffffff2 }; +VECT_VAR_DECL(expected_vld_st4_2,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa, + 0xaaaaaaaaaaaaaaaa }; +VECT_VAR_DECL(expected_vld_st4_3,poly,64,1) [] = { 0xfffffffffffffff3 }; +VECT_VAR_DECL(expected_vld_st4_3,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa, + 0xaaaaaaaaaaaaaaaa }; + +/* Expected results: vget_lane. */ +VECT_VAR_DECL(vget_lane_expected,poly,64,1) = 0xfffffffffffffff0; +VECT_VAR_DECL(vget_lane_expected,poly,64,2) = 0xfffffffffffffff0; + int main (void) { int i; @@ -341,6 +389,26 @@ int main (void) CHECK(TEST_MSG, poly, 64, 1, PRIx64, vget_low_expected, ""); + /* vget_high_p64 tests. */ +#undef TEST_MSG +#define TEST_MSG "VGET_HIGH" + +#define TEST_VGET_HIGH(T1, T2, W, N, N2) \ + VECT_VAR(vget_high_vector64, T1, W, N) = \ + vget_high_##T2##W(VECT_VAR(vget_high_vector128, T1, W, N2)); \ + vst1_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vget_high_vector64, T1, W, N)) + + DECL_VARIABLE(vget_high_vector64, poly, 64, 1); + DECL_VARIABLE(vget_high_vector128, poly, 64, 2); + + CLEAN(result, poly, 64, 1); + + VLOAD(vget_high_vector128, buffer, q, poly, p, 64, 2); + + TEST_VGET_HIGH(poly, p, 64, 1, 2); + + CHECK(TEST_MSG, poly, 64, 1, PRIx64, vget_high_expected, ""); + /* vld1_p64 tests. */ #undef TEST_MSG #define TEST_MSG "VLD1/VLD1Q" @@ -645,7 +713,7 @@ int main (void) VECT_VAR(vst1_lane_vector, T1, W, N) = \ vld1##Q##_##T2##W(VECT_VAR(buffer, T1, W, N)); \ vst1##Q##_lane_##T2##W(VECT_VAR(result, T1, W, N), \ - VECT_VAR(vst1_lane_vector, T1, W, N), L) + VECT_VAR(vst1_lane_vector, T1, W, N), L); DECL_VARIABLE(vst1_lane_vector, poly, 64, 1); DECL_VARIABLE(vst1_lane_vector, poly, 64, 2); @@ -659,5 +727,298 @@ int main (void) CHECK(TEST_MSG, poly, 64, 1, PRIx64, vst1_lane_expected, ""); CHECK(TEST_MSG, poly, 64, 2, PRIx64, vst1_lane_expected, ""); +#ifdef __aarch64__ + + /* vmov_n_p64 tests. */ +#undef TEST_MSG +#define TEST_MSG "VMOV/VMOVQ" + +#define TEST_VMOV(Q, T1, T2, W, N) \ + VECT_VAR(vmov_n_vector, T1, W, N) = \ + vmov##Q##_n_##T2##W(VECT_VAR(buffer_dup, T1, W, N)[i]); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vmov_n_vector, T1, W, N)) + + DECL_VARIABLE(vmov_n_vector, poly, 64, 1); + DECL_VARIABLE(vmov_n_vector, poly, 64, 2); + + /* Try to read different places from the input buffer. */ + for (i=0; i< 3; i++) { + CLEAN(result, poly, 64, 1); + CLEAN(result, poly, 64, 2); + + TEST_VMOV(, poly, p, 64, 1); + TEST_VMOV(q, poly, p, 64, 2); + + switch (i) { + case 0: + CHECK(TEST_MSG, poly, 64, 1, PRIx64, vmov_n_expected0, ""); + CHECK(TEST_MSG, poly, 64, 2, PRIx64, vmov_n_expected0, ""); + break; + case 1: + CHECK(TEST_MSG, poly, 64, 1, PRIx64, vmov_n_expected1, ""); + CHECK(TEST_MSG, poly, 64, 2, PRIx64, vmov_n_expected1, ""); + break; + case 2: + CHECK(TEST_MSG, poly, 64, 1, PRIx64, vmov_n_expected2, ""); + CHECK(TEST_MSG, poly, 64, 2, PRIx64, vmov_n_expected2, ""); + break; + default: + abort(); + } + } + + /* vget_lane_p64 tests. */ +#undef TEST_MSG +#define TEST_MSG "VGET_LANE/VGETQ_LANE" + +#define TEST_VGET_LANE(Q, T1, T2, W, N, L) \ + VECT_VAR(vget_lane_vector, T1, W, N) = vget##Q##_lane_##T2##W(VECT_VAR(vector, T1, W, N), L); \ + if (VECT_VAR(vget_lane_vector, T1, W, N) != VECT_VAR(vget_lane_expected, T1, W, N)) { \ + fprintf(stderr, \ + "ERROR in %s (%s line %d in result '%s') at type %s " \ + "got 0x%" PRIx##W " != 0x%" PRIx##W "\n", \ + TEST_MSG, __FILE__, __LINE__, \ + STR(VECT_VAR(vget_lane_expected, T1, W, N)), \ + STR(VECT_NAME(T1, W, N)), \ + VECT_VAR(vget_lane_vector, T1, W, N), \ + VECT_VAR(vget_lane_expected, T1, W, N)); \ + abort (); \ + } + + /* Initialize input values. */ + DECL_VARIABLE(vector, poly, 64, 1); + DECL_VARIABLE(vector, poly, 64, 2); + + VLOAD(vector, buffer, , poly, p, 64, 1); + VLOAD(vector, buffer, q, poly, p, 64, 2); + + VECT_VAR_DECL(vget_lane_vector, poly, 64, 1); + VECT_VAR_DECL(vget_lane_vector, poly, 64, 2); + + TEST_VGET_LANE( , poly, p, 64, 1, 0); + TEST_VGET_LANE(q, poly, p, 64, 2, 0); + + /* vldx_lane_p64 tests. */ +#undef TEST_MSG +#define TEST_MSG "VLDX_LANE/VLDXQ_LANE" + +VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 64, 2); +VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 64, 3); +VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 64, 4); + + /* In this case, input variables are arrays of vectors. */ +#define DECL_VLD_STX_LANE(T1, W, N, X) \ + VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector, T1, W, N, X); \ + VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector_src, T1, W, N, X); \ + VECT_VAR_DECL(result_bis_##X, T1, W, N)[X * N] + + /* We need to use a temporary result buffer (result_bis), because + the one used for other tests is not large enough. A subset of the + result data is moved from result_bis to result, and it is this + subset which is used to check the actual behavior. The next + macro enables to move another chunk of data from result_bis to + result. */ + /* We also use another extra input buffer (buffer_src), which we + fill with 0xAA, and which it used to load a vector from which we + read a given lane. */ + +#define TEST_VLDX_LANE(Q, T1, T2, W, N, X, L) \ + memset (VECT_VAR(buffer_src, T1, W, N), 0xAA, \ + sizeof(VECT_VAR(buffer_src, T1, W, N))); \ + \ + VECT_ARRAY_VAR(vector_src, T1, W, N, X) = \ + vld##X##Q##_##T2##W(VECT_VAR(buffer_src, T1, W, N)); \ + \ + VECT_ARRAY_VAR(vector, T1, W, N, X) = \ + /* Use dedicated init buffer, of size. X */ \ + vld##X##Q##_lane_##T2##W(VECT_VAR(buffer_vld##X##_lane, T1, W, X), \ + VECT_ARRAY_VAR(vector_src, T1, W, N, X), \ + L); \ + vst##X##Q##_##T2##W(VECT_VAR(result_bis_##X, T1, W, N), \ + VECT_ARRAY_VAR(vector, T1, W, N, X)); \ + memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis_##X, T1, W, N), \ + sizeof(VECT_VAR(result, T1, W, N))) + + /* Overwrite "result" with the contents of "result_bis"[Y]. */ +#undef TEST_EXTRA_CHUNK +#define TEST_EXTRA_CHUNK(T1, W, N, X, Y) \ + memcpy(VECT_VAR(result, T1, W, N), \ + &(VECT_VAR(result_bis_##X, T1, W, N)[Y*N]), \ + sizeof(VECT_VAR(result, T1, W, N))); + + /* Add some padding to try to catch out of bound accesses. */ +#define ARRAY1(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[1]={42} +#define DUMMY_ARRAY(V, T, W, N, L) \ + VECT_VAR_DECL(V,T,W,N)[N*L]={0}; \ + ARRAY1(V##_pad,T,W,N) + +#define DECL_ALL_VLD_STX_LANE(X) \ + DECL_VLD_STX_LANE(poly, 64, 1, X); \ + DECL_VLD_STX_LANE(poly, 64, 2, X); + +#define TEST_ALL_VLDX_LANE(X) \ + TEST_VLDX_LANE(, poly, p, 64, 1, X, 0); \ + TEST_VLDX_LANE(q, poly, p, 64, 2, X, 0); + +#define TEST_ALL_EXTRA_CHUNKS(X,Y) \ + TEST_EXTRA_CHUNK(poly, 64, 1, X, Y) \ + TEST_EXTRA_CHUNK(poly, 64, 2, X, Y) + +#define CHECK_RESULTS_VLD_STX_LANE(test_name,EXPECTED,comment) \ + CHECK(test_name, poly, 64, 1, PRIx64, EXPECTED, comment); \ + CHECK(test_name, poly, 64, 2, PRIx64, EXPECTED, comment); + + /* Declare the temporary buffers / variables. */ + DECL_ALL_VLD_STX_LANE(2); + DECL_ALL_VLD_STX_LANE(3); + DECL_ALL_VLD_STX_LANE(4); + + DUMMY_ARRAY(buffer_src, poly, 64, 1, 4); + DUMMY_ARRAY(buffer_src, poly, 64, 2, 4); + + /* Check vld2_lane/vld2q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VLD2_LANE/VLD2Q_LANE" + TEST_ALL_VLDX_LANE(2); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st2_0, " chunk 0"); + + TEST_ALL_EXTRA_CHUNKS(2, 1); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st2_1, " chunk 1"); + + /* Check vld3_lane/vld3q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VLD3_LANE/VLD3Q_LANE" + TEST_ALL_VLDX_LANE(3); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st3_0, " chunk 0"); + + TEST_ALL_EXTRA_CHUNKS(3, 1); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st3_1, " chunk 1"); + + TEST_ALL_EXTRA_CHUNKS(3, 2); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st3_2, " chunk 2"); + + /* Check vld4_lane/vld4q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VLD4_LANE/VLD4Q_LANE" + TEST_ALL_VLDX_LANE(4); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_0, " chunk 0"); + + TEST_ALL_EXTRA_CHUNKS(4, 1); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_1, " chunk 1"); + TEST_ALL_EXTRA_CHUNKS(4, 2); + + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_2, " chunk 2"); + + TEST_ALL_EXTRA_CHUNKS(4, 3); + CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_3, " chunk 3"); + + /* In this case, input variables are arrays of vectors. */ +#define DECL_VSTX_LANE(T1, W, N, X) \ + VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector, T1, W, N, X); \ + VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector_src, T1, W, N, X); \ + VECT_VAR_DECL(result_bis_##X, T1, W, N)[X * N] + + /* We need to use a temporary result buffer (result_bis), because + the one used for other tests is not large enough. A subset of the + result data is moved from result_bis to result, and it is this + subset which is used to check the actual behavior. The next + macro enables to move another chunk of data from result_bis to + result. */ + /* We also use another extra input buffer (buffer_src), which we + fill with 0xAA, and which it used to load a vector from which we + read a given lane. */ +#define TEST_VSTX_LANE(Q, T1, T2, W, N, X, L) \ + memset (VECT_VAR(buffer_src, T1, W, N), 0xAA, \ + sizeof(VECT_VAR(buffer_src, T1, W, N))); \ + memset (VECT_VAR(result_bis_##X, T1, W, N), 0, \ + sizeof(VECT_VAR(result_bis_##X, T1, W, N))); \ + \ + VECT_ARRAY_VAR(vector_src, T1, W, N, X) = \ + vld##X##Q##_##T2##W(VECT_VAR(buffer_src, T1, W, N)); \ + \ + VECT_ARRAY_VAR(vector, T1, W, N, X) = \ + /* Use dedicated init buffer, of size X. */ \ + vld##X##Q##_lane_##T2##W(VECT_VAR(buffer_vld##X##_lane, T1, W, X), \ + VECT_ARRAY_VAR(vector_src, T1, W, N, X), \ + L); \ + vst##X##Q##_lane_##T2##W(VECT_VAR(result_bis_##X, T1, W, N), \ + VECT_ARRAY_VAR(vector, T1, W, N, X), \ + L); \ + memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis_##X, T1, W, N), \ + sizeof(VECT_VAR(result, T1, W, N))); + +#define TEST_ALL_VSTX_LANE(X) \ + TEST_VSTX_LANE(, poly, p, 64, 1, X, 0); \ + TEST_VSTX_LANE(q, poly, p, 64, 2, X, 0); + + /* Check vst2_lane/vst2q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VST2_LANE/VST2Q_LANE" + TEST_ALL_VSTX_LANE(2); + +#define CMT " (chunk 0)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st2_0, CMT); + + TEST_ALL_EXTRA_CHUNKS(2, 1); +#undef CMT +#define CMT " chunk 1" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st2_1, CMT); + + /* Check vst3_lane/vst3q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VST3_LANE/VST3Q_LANE" + TEST_ALL_VSTX_LANE(3); + +#undef CMT +#define CMT " (chunk 0)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st3_0, CMT); + + TEST_ALL_EXTRA_CHUNKS(3, 1); + +#undef CMT +#define CMT " (chunk 1)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st3_1, CMT); + + TEST_ALL_EXTRA_CHUNKS(3, 2); + +#undef CMT +#define CMT " (chunk 2)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st3_2, CMT); + + /* Check vst4_lane/vst4q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VST4_LANE/VST4Q_LANE" + TEST_ALL_VSTX_LANE(4); + +#undef CMT +#define CMT " (chunk 0)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_0, CMT); + + TEST_ALL_EXTRA_CHUNKS(4, 1); + +#undef CMT +#define CMT " (chunk 1)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_1, CMT); + + TEST_ALL_EXTRA_CHUNKS(4, 2); + +#undef CMT +#define CMT " (chunk 2)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_2, CMT); + + TEST_ALL_EXTRA_CHUNKS(4, 3); + +#undef CMT +#define CMT " (chunk 3)" + CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_3, CMT); + +#endif /* __aarch64__. */ + return 0; } diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c index 808641524c47b2c245ee2f10e74a784a7bccefc9..f192d4dda514287c8417e7fc922bc580b209b163 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c @@ -1,7 +1,8 @@ /* This file contains tests for the vreinterpret *p128 intrinsics. */ -/* { dg-require-effective-target arm_crypto_ok } */ +/* { dg-require-effective-target arm_crypto_ok { target { arm*-*-* } } } */ /* { dg-add-options arm_crypto } */ +/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } }*/ #include <arm_neon.h> #include "arm-neon-ref.h" @@ -78,9 +79,7 @@ VECT_VAR_DECL(vreint_expected_q_f16_p128,hfloat,16,8) [] = { 0xfff0, 0xffff, int main (void) { DECL_VARIABLE_128BITS_VARIANTS(vreint_vector); - DECL_VARIABLE(vreint_vector, poly, 64, 2); DECL_VARIABLE_128BITS_VARIANTS(vreint_vector_res); - DECL_VARIABLE(vreint_vector_res, poly, 64, 2); clean_results (); diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c index 1d8cf9aa69f0b5b0717e98de613e3c350d6395d4..c915fd2fea6b4d8770c9a4aab88caad391105d89 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c @@ -1,7 +1,8 @@ /* This file contains tests for the vreinterpret *p64 intrinsics. */ -/* { dg-require-effective-target arm_crypto_ok } */ +/* { dg-require-effective-target arm_crypto_ok { target { arm*-*-* } } } */ /* { dg-add-options arm_crypto } */ +/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } }*/ #include <arm_neon.h> #include "arm-neon-ref.h" @@ -121,11 +122,7 @@ int main (void) CHECK_FP(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, ""); DECL_VARIABLE_ALL_VARIANTS(vreint_vector); - DECL_VARIABLE(vreint_vector, poly, 64, 1); - DECL_VARIABLE(vreint_vector, poly, 64, 2); DECL_VARIABLE_ALL_VARIANTS(vreint_vector_res); - DECL_VARIABLE(vreint_vector_res, poly, 64, 1); - DECL_VARIABLE(vreint_vector_res, poly, 64, 2); clean_results ();