On Mon, 2023-08-07 at 17:18 +0800, Kewen.Lin wrote: > Hi Carl, > > Sorry for the late review. > > on 2023/8/2 02:29, Carl Love wrote: > > GCC maintainers: > > > > Ver 2: Re-worked the test vec-cmpne.c to create a compile only > > test > > verify the instruction generation and a runnable test to verify the > > built-in functionality. Retested the patch on Power 8 LE/BE, Power > > 9LE/BE and Power 10 LE with no regressions. > > > > The following patch cleans up the definition for the > > __builtin_altivec_vcmpne{b,h,w}. The current implementation > > implies > > that the built-in is only supported on Power 9 since it is defined > > under the Power 9 stanza. However the built-in has no ISA > > restrictions > > as stated in the Power Vector Intrinsic Programming Reference > > document. > > The current built-in works because the built-in gets replaced > > during > > GIMPLE folding by a simple not-equal operator so it doesn't get > > expanded and checked for Power 9 code generation. > > > > This patch moves the definition to the Altivec stanza in the built- > > in > > definition file to make it clear the built-ins are valid for Power > > 8, > > Power 9 and beyond. > > > > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power > > 10 > > LE with no regressions. > > > > Please let me know if the patch is acceptable for > > mainline. Thanks. > > > > Carl > > > > ------------------------------------------------ > > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation > > > > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are > > defined > > under the Power 9 section of r66000-builtins. This implies they > > are only > > supported on Power 9 and above when in fact they are defined and > > work with > > Altivec as well with the appropriate Altivec instruction > > generation. > > > > The vec_cmpne builtin should generate the vcmpequ{b,h,w} > > instruction with > > Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer > > processors. > > > > This patch moves the definitions to the Altivec stanza to make it > > clear > > the built-ins are supported for all Altivec processors. The patch > > enables the vcmpequ{b,h,w} instruction to be generated on Altivec > > and > > the vcmpne{b,h,w} instruction to be generated on Power 9 and > > beyond. > > But as you noted above, the current built-ins work as expected, that > is > to generate with vcmpequ{b,h,w} on altivec but not Power9 while > generate > with vcmpne{b,h,w} on Power9. So I think we shouldn't say it's > enabled > by this patch. Instead it's to remove the confusion.
OK, changed. > > > There is existing test coverage for the vec_cmpne built-in for > > vector bool char, vector bool short, vector bool int, > > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c. > > Coverage for vector signed int, vector unsigned int is in > > p8vector-builtin-2.c. > > > > Test vec-cmpne.c is updated to check the generation of the > > vcmpequ{b,h,w} > > instructions for Altivec. A new test vec-cmpne-runnable.c is added > > to > > verify the built-ins work as expected. > > > > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 > > LE > > with no regressions. > > > > gcc/ChangeLog: > > > > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, > > vcmpnew): > > Move definitions to Altivec stanza. > > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New > > define_expand. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test. > > * gcc.target/powerpc/vec-cmpne.c (define_test_functions, > > execute_test_functions) moved to vec-cmpne.h. Added > > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw. > > * gcc.target/powerpc/vec-cmpne.h: New include file for vec- > > cmpne.c > > and vec-cmpne-runnable.c. Split define_test_functions > > definition > > into define_test_functions and define_init_verify_functions. > > --- > > gcc/config/rs6000/altivec.md | 12 ++ > > gcc/config/rs6000/rs6000-builtins.def | 18 +-- > > .../gcc.target/powerpc/vec-cmpne-runnable.c | 36 ++++++ > > gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 110 ++---------- > > ------ > > gcc/testsuite/gcc.target/powerpc/vec-cmpne.h | 86 ++++++++++++++ > > 5 files changed, 151 insertions(+), 111 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne- > > runnable.c > > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h > > > > diff --git a/gcc/config/rs6000/altivec.md > > b/gcc/config/rs6000/altivec.md > > index ad1224e0b57..31f65aa1b7a 100644 > > --- a/gcc/config/rs6000/altivec.md > > +++ b/gcc/config/rs6000/altivec.md > > @@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p" > > "vcmpequq. %0,%1,%2" > > [(set_attr "type" "veccmpfx")]) > > > > +;; Expand for builtin vcmpne{b,h,w} > > +(define_expand "altivec_vcmpne_<mode>" > > + [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" > > "=v") > > + (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 > > "altivec_register_operand" "v") > > + (match_operand:VSX_EXTRACT_I 2 > > "altivec_register_operand" "v"))) > > + (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" > > "=v") > > + (not:VSX_EXTRACT_I (match_dup 3)))] > > + "TARGET_ALTIVEC" > > + { > > + operands[3] = gen_reg_rtx (GET_MODE (operands[0])); > > + }); > > + > > (define_insn "*altivec_vcmpgts<VI_char>_p" > > [(set (reg:CC CR6_REGNO) > > (unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v") > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > > b/gcc/config/rs6000/rs6000-builtins.def > > index 638d0bc72ca..6b06fa8b34d 100644 > > --- a/gcc/config/rs6000/rs6000-builtins.def > > +++ b/gcc/config/rs6000/rs6000-builtins.def > > @@ -641,6 +641,15 @@ > > const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi); > > VCMPGTUW_P vector_gtu_v4si_p {pred} > > > > + const vsc __builtin_altivec_vcmpneb (vsc, vsc); > > + VCMPNEB altivec_vcmpne_v16qi {} > > + > > + const vss __builtin_altivec_vcmpneh (vss, vss); > > + VCMPNEH altivec_vcmpne_v8hi {} > > + > > + const vsi __builtin_altivec_vcmpnew (vsi, vsi); > > + VCMPNEW altivec_vcmpne_v4si {} > > + > > const vsi __builtin_altivec_vctsxs (vf, const int<5>); > > VCTSXS altivec_vctsxs {} > > > > @@ -2599,9 +2608,6 @@ > > const signed int __builtin_altivec_vcmpaew_p (vsi, vsi); > > VCMPAEW_P vector_ae_v4si_p {} > > > > - const vsc __builtin_altivec_vcmpneb (vsc, vsc); > > - VCMPNEB vcmpneb {} > > - > > const signed int __builtin_altivec_vcmpneb_p (vsc, vsc); > > VCMPNEB_P vector_ne_v16qi_p {} > > > > @@ -2614,15 +2620,9 @@ > > const signed int __builtin_altivec_vcmpnefp_p (vf, vf); > > VCMPNEFP_P vector_ne_v4sf_p {} > > > > - const vss __builtin_altivec_vcmpneh (vss, vss); > > - VCMPNEH vcmpneh {} > > - > > const signed int __builtin_altivec_vcmpneh_p (vss, vss); > > VCMPNEH_P vector_ne_v8hi_p {} > > > > - const vsi __builtin_altivec_vcmpnew (vsi, vsi); > > - VCMPNEW vcmpnew {} > > - > > const signed int __builtin_altivec_vcmpnew_p (vsi, vsi); > > VCMPNEW_P vector_ne_v4si_p {} > > > > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c > > b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c > > new file mode 100644 > > index 00000000000..be038639a6c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c > > @@ -0,0 +1,36 @@ > > +/* { dg-do run } */ > > +/* { dg-require-effective-target powerpc_altivec_ok } */ > > This is for running, please check for vmx_hw. > > > +/* { dg-options "-mvsx -O3 " } */ > > Nit: s/-mvsx/-maltivec/, s/-O3/-O2/. OK, fixed both issues. > > > + > > +/* Test that the vec_cmpne builtin works as expected. */ > > + > > +#include "vec-cmpne.h" > > + > > +define_test_functions (int, signed int, signed int, si); > > +define_test_functions (int, unsigned int, unsigned int, ui); > > +define_test_functions (short, signed short, signed short, ss); > > +define_test_functions (short, unsigned short, unsigned short, us); > > +define_test_functions (char, signed char, signed char, sc); > > +define_test_functions (char, unsigned char, unsigned char, uc); > > +define_test_functions (int, signed int, float, ff); > > + > > +define_init_verify_functions (int, signed int, signed int, si); > > +define_init_verify_functions (int, unsigned int, unsigned int, > > ui); > > +define_init_verify_functions (short, signed short, signed short, > > ss); > > +define_init_verify_functions (short, unsigned short, unsigned > > short, us); > > +define_init_verify_functions (char, signed char, signed char, sc); > > +define_init_verify_functions (char, unsigned char, unsigned char, > > uc); > > +define_init_verify_functions (int, signed int, float, ff); > > + > > +int main () > > +{ > > + execute_test_functions (int, signed int, signed int, si); > > + execute_test_functions (int, unsigned int, unsigned int, ui); > > + execute_test_functions (short, signed short, signed short, ss); > > + execute_test_functions (short, unsigned short, unsigned short, > > us); > > + execute_test_functions (char, signed char, signed char, sc); > > + execute_test_functions (char, unsigned char, unsigned char, uc); > > + execute_test_functions (int, signed int, float, ff); > > + > > + return 0; > > +} > > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c > > b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c > > index edba9dece66..8ceddd5cc3f 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c > > +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c > > @@ -1,94 +1,11 @@ > > -/* { dg-do run } */ > > -/* { dg-require-effective-target powerpc_vsx_ok } */ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target powerpc_altivec_ok } */ > > /* { dg-options "-mvsx -O3" } */ > > Nit: s/-mvsx/-maltivec/, s/-O3/-O2/. Fixed. > > > > > -/* Test that the vec_cmpne builtin works as expected. */ > > - > > -#include "altivec.h" > > - > > -#define N 4096 > > - > > -void abort (); > > - > > -#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \ > > -\ > > -RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \ > > -RTYPE result_eq_##NAME[N] __attribute__((aligned(16))); \ > > -STYPE operand1_##NAME[N] __attribute__((aligned(16))); \ > > -STYPE operand2_##NAME[N] __attribute__((aligned(16))); \ > > -RTYPE expected_##NAME[N] __attribute__((aligned(16))); \ > > -\ > > -__attribute__((noinline)) void vector_tests_##NAME () \ > > -{ \ > > - vector STYPE v1_##NAME, v2_##NAME; \ > > - vector bool VBTYPE tmp_##NAME; \ > > - int i; \ > > - for (i = 0; i < N; i+=16/sizeof (STYPE)) \ > > - { \ > > - /* result_ne = operand1!=operand2. */ \ > > - v1_##NAME = vec_vsx_ld (0, (const vector > > STYPE*)&operand1_##NAME[i]); \ > > - v2_##NAME = vec_vsx_ld (0, (const vector > > STYPE*)&operand2_##NAME[i]); \ > > -\ > > - tmp_##NAME = vec_cmpeq (v1_##NAME, v2_##NAME); \ > > - vec_vsx_st (tmp_##NAME, 0, &result_eq_##NAME[i]); \ > > -\ > > - tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \ > > - vec_vsx_st (tmp_##NAME, 0, &result_ne_##NAME[i]); \ > > - } \ > > -} \ > > -\ > > -__attribute__((noinline)) void init_##NAME () \ > > -{ \ > > - int i; \ > > - for (i = 0; i < N; ++i) \ > > - { \ > > - result_ne_##NAME[i] = 7; \ > > - result_eq_##NAME[i] = 15; \ > > - if (i%3 == 0) \ > > - { \ > > - /* op1 < op2. */ \ > > - operand1_##NAME[i] = 1; \ > > - operand2_##NAME[i] = 2; \ > > - } \ > > - else if (i%3 == 1) \ > > - { \ > > - /* op1 > op2. */ \ > > - operand1_##NAME[i] = 2; \ > > - operand2_##NAME[i] = 1; \ > > - } \ > > - else if (i%3 == 2) \ > > - { \ > > - /* op1 == op2. */ \ > > - operand1_##NAME[i] = 3; \ > > - operand2_##NAME[i] = 3; \ > > - } \ > > - /* For vector comparisons: "For each element of the > > result_ne, the \ > > - value of each bit is 1 if the corresponding elements of ARG1 > > and \ > > - ARG2 are equal." {or whatever the comparison is} "Otherwise, > > the \ > > - value of each bit is 0." */ \ > > - expected_##NAME[i] = -1 * (RTYPE)(operand1_##NAME[i] != > > operand2_##NAME[i]); \ > > - } \ > > -} \ > > -\ > > -__attribute__((noinline)) void verify_results_##NAME () \ > > -{ \ > > - int i; \ > > - for (i = 0; i < N; ++i) \ > > - { \ > > - if ( ((result_ne_##NAME[i] != expected_##NAME[i]) || \ > > - (result_ne_##NAME[i] == result_eq_##NAME[i]))) \ > > - abort (); \ > > - } \ > > -} > > - > > - > > -#define execute_test_functions(VBTYPE, RTYPE, STYPE, NAME) \ > > -{ \ > > - init_##NAME (); \ > > - vector_tests_##NAME (); \ > > - verify_results_##NAME (); \ > > -} > > +/* Test that the vec_cmpne builtin generates the expected Altive > > + instructions. */ > > > > +#include "vec-cmpne.h" > > > > define_test_functions (int, signed int, signed int, si); > > define_test_functions (int, unsigned int, unsigned int, ui); > > @@ -98,17 +15,6 @@ define_test_functions (char, signed char, signed > > char, sc); > > define_test_functions (char, unsigned char, unsigned char, uc); > > define_test_functions (int, signed int, float, ff); > > > > -int main () > > -{ > > - execute_test_functions (int, signed int, signed int, si); > > - execute_test_functions (int, unsigned int, unsigned int, ui); > > - execute_test_functions (short, signed short, signed short, ss); > > - execute_test_functions (short, unsigned short, unsigned short, > > us); > > - execute_test_functions (char, signed char, signed char, sc); > > - execute_test_functions (char, unsigned char, unsigned char, uc); > > - execute_test_functions (int, signed int, float, ff); > > - > > - return 0; > > -} > > - > > - > > +/* { dg-final { scan-assembler-times {\mvcmpequb\M} 4 } } */ > > +/* { dg-final { scan-assembler-times {\mvcmpequh\M} 4 } } */ > > +/* { dg-final { scan-assembler-times {\mvcmpequw\M} 4 } } */ > > Nit: There are two (signed and unsigned) for each int, short and > char, > but we get four for each, it's due to the related loop is unrolled > by two, I think we can disable unrolling for this to make it robust > with something such as: > > --- vec-cmpne.h.orig 2023-08-02 22:31:05.150798846 -0500 > +++ vec-cmpne.h 2023-08-02 22:29:44.921109862 -0500 > @@ -4,6 +4,9 @@ > > void abort (); > > +#define PRAGMA(X) _Pragma (#X) > +#define UNROLL0 PRAGMA (GCC unroll 0) > + > #define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \ > \ > RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \ > @@ -17,6 +20,7 @@ > vector STYPE v1_##NAME, v2_##NAME; \ > vector bool VBTYPE tmp_##NAME; \ > int i; \ > + UNROLL0 \ > for (i = 0; i < N; i+=16/sizeof (STYPE)) \ > { \ > /* result_ne = operand1!=operand2. */ \ > > ---------------- > > Then the above scan-assembler-times should be all with 2. Added the no unroll, updated the expected counts. Carl