Re: [PATCH v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

Carl Love via Gcc-patches Mon, 07 Aug 2023 10:38:00 -0700

On Mon, 2023-08-07 at 17:18 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> Sorry for the late review.
> 
> on 2023/8/2 02:29, Carl Love wrote:
> > GCC maintainers:
> > 
> > Ver 2:  Re-worked the test vec-cmpne.c to create a compile only
> > test
> > verify the instruction generation and a runnable test to verify the
> > built-in functionality.  Retested the patch on Power 8 LE/BE, Power
> > 9LE/BE and Power 10 LE with no regressions.
> > 
> > The following patch cleans up the definition for the
> > __builtin_altivec_vcmpne{b,h,w}.  The current implementation
> > implies
> > that the built-in is only supported on Power 9 since it is defined
> > under the Power 9 stanza.  However the built-in has no ISA
> > restrictions
> > as stated in the Power Vector Intrinsic Programming Reference
> > document.
> > The current built-in works because the built-in gets replaced
> > during
> > GIMPLE folding by a simple not-equal operator so it doesn't get
> > expanded and checked for Power 9 code generation.
> > 
> > This patch moves the definition to the Altivec stanza in the built-
> > in
> > definition file to make it clear the built-ins are valid for Power
> > 8,
> > Power 9 and beyond.  
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE with no regressions.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >                       Carl 
> > 
> > ------------------------------------------------
> > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> > 
> > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are
> > defined
> > under the Power 9 section of r66000-builtins.  This implies they
> > are only
> > supported on Power 9 and above when in fact they are defined and
> > work with
> > Altivec as well with the appropriate Altivec instruction
> > generation.
> > 
> > The vec_cmpne builtin should generate the vcmpequ{b,h,w}
> > instruction with
> > Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
> > processors.
> > 
> > This patch moves the definitions to the Altivec stanza to make it
> > clear
> > the built-ins are supported for all Altivec processors.  The patch
> > enables the vcmpequ{b,h,w} instruction to be generated on Altivec
> > and
> > the vcmpne{b,h,w} instruction to be generated on Power 9 and
> > beyond.
> 
> But as you noted above, the current built-ins work as expected, that
> is
> to generate with vcmpequ{b,h,w} on altivec but not Power9 while
> generate
> with vcmpne{b,h,w} on Power9.  So I think we shouldn't say it's
> enabled
> by this patch.  Instead it's to remove the confusion.


OK, changed.
> 
> > There is existing test coverage for the vec_cmpne built-in for
> > vector bool char, vector bool short, vector bool int,
> > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> > Coverage for vector signed int, vector unsigned int is in
> > p8vector-builtin-2.c.
> > 
> > Test vec-cmpne.c is updated to check the generation of the
> > vcmpequ{b,h,w}
> > instructions for Altivec.  A new test vec-cmpne-runnable.c is added
> > to
> > verify the built-ins work as expected.
> > 
> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> > 
> > gcc/ChangeLog:
> > 
> >     * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> >     Move definitions to Altivec stanza.
> >     * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> >     define_expand.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >     * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> >     * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> >     execute_test_functions) moved to vec-cmpne.h.  Added
> >     scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> >     * gcc.target/powerpc/vec-cmpne.h: New include file for vec-
> > cmpne.c
> >     and vec-cmpne-runnable.c. Split define_test_functions
> > definition
> >     into define_test_functions and define_init_verify_functions.
> > ---
> >  gcc/config/rs6000/altivec.md                  |  12 ++
> >  gcc/config/rs6000/rs6000-builtins.def         |  18 +--
> >  .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++++++
> >  gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 110 ++----------
> > ------
> >  gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  86 ++++++++++++++
> >  5 files changed, 151 insertions(+), 111 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-
> > runnable.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
> > 
> > diff --git a/gcc/config/rs6000/altivec.md
> > b/gcc/config/rs6000/altivec.md
> > index ad1224e0b57..31f65aa1b7a 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
> >    "vcmpequq. %0,%1,%2"
> >    [(set_attr "type" "veccmpfx")])
> >  
> > +;; Expand for builtin vcmpne{b,h,w}
> > +(define_expand "altivec_vcmpne_<mode>"
> > +  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand"
> > "=v")
> > +   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1
> > "altivec_register_operand" "v")
> > +                     (match_operand:VSX_EXTRACT_I 2
> > "altivec_register_operand" "v")))
> > +   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand"
> > "=v")
> > +        (not:VSX_EXTRACT_I (match_dup 3)))]
> > +  "TARGET_ALTIVEC"
> > +  {
> > +    operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
> > +  });
> > +
> >  (define_insn "*altivec_vcmpgts<VI_char>_p"
> >    [(set (reg:CC CR6_REGNO)
> >     (unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
> > diff --git a/gcc/config/rs6000/rs6000-builtins.def
> > b/gcc/config/rs6000/rs6000-builtins.def
> > index 638d0bc72ca..6b06fa8b34d 100644
> > --- a/gcc/config/rs6000/rs6000-builtins.def
> > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > @@ -641,6 +641,15 @@
> >    const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
> >      VCMPGTUW_P vector_gtu_v4si_p {pred}
> >  
> > +  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
> > +    VCMPNEB altivec_vcmpne_v16qi {}
> > +
> > +  const vss __builtin_altivec_vcmpneh (vss, vss);
> > +    VCMPNEH altivec_vcmpne_v8hi {}
> > +
> > +  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
> > +    VCMPNEW altivec_vcmpne_v4si {}
> > +
> >    const vsi __builtin_altivec_vctsxs (vf, const int<5>);
> >      VCTSXS altivec_vctsxs {}
> >  
> > @@ -2599,9 +2608,6 @@
> >    const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
> >      VCMPAEW_P vector_ae_v4si_p {}
> >  
> > -  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
> > -    VCMPNEB vcmpneb {}
> > -
> >    const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
> >      VCMPNEB_P vector_ne_v16qi_p {}
> >  
> > @@ -2614,15 +2620,9 @@
> >    const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
> >      VCMPNEFP_P vector_ne_v4sf_p {}
> >  
> > -  const vss __builtin_altivec_vcmpneh (vss, vss);
> > -    VCMPNEH vcmpneh {}
> > -
> >    const signed int __builtin_altivec_vcmpneh_p (vss, vss);
> >      VCMPNEH_P vector_ne_v8hi_p {}
> >  
> > -  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
> > -    VCMPNEW vcmpnew {}
> > -
> >    const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
> >      VCMPNEW_P vector_ne_v4si_p {}
> >  
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> > b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> > new file mode 100644
> > index 00000000000..be038639a6c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> 
> This is for running, please check for vmx_hw.
> 
> > +/* { dg-options "-mvsx -O3 " } */
> 
> Nit: s/-mvsx/-maltivec/, s/-O3/-O2/.

OK, fixed both issues.
> 
> > +
> > +/* Test that the vec_cmpne builtin works as expected.  */
> > +
> > +#include "vec-cmpne.h"
> > +
> > +define_test_functions (int, signed int, signed int, si);
> > +define_test_functions (int, unsigned int, unsigned int, ui);
> > +define_test_functions (short, signed short, signed short, ss);
> > +define_test_functions (short, unsigned short, unsigned short, us);
> > +define_test_functions (char, signed char, signed char, sc);
> > +define_test_functions (char, unsigned char, unsigned char, uc);
> > +define_test_functions (int, signed int, float, ff);
> > +
> > +define_init_verify_functions (int, signed int, signed int, si);
> > +define_init_verify_functions (int, unsigned int, unsigned int,
> > ui);
> > +define_init_verify_functions (short, signed short, signed short,
> > ss);
> > +define_init_verify_functions (short, unsigned short, unsigned
> > short, us);
> > +define_init_verify_functions (char, signed char, signed char, sc);
> > +define_init_verify_functions (char, unsigned char, unsigned char,
> > uc);
> > +define_init_verify_functions (int, signed int, float, ff);
> > +
> > +int main ()
> > +{
> > +  execute_test_functions (int, signed int, signed int, si);
> > +  execute_test_functions (int, unsigned int, unsigned int, ui);
> > +  execute_test_functions (short, signed short, signed short, ss);
> > +  execute_test_functions (short, unsigned short, unsigned short,
> > us);
> > +  execute_test_functions (char, signed char, signed char, sc);
> > +  execute_test_functions (char, unsigned char, unsigned char, uc);
> > +  execute_test_functions (int, signed int, float, ff);
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > index edba9dece66..8ceddd5cc3f 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > @@ -1,94 +1,11 @@
> > -/* { dg-do run } */
> > -/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> >  /* { dg-options "-mvsx -O3" } */
> 
> Nit: s/-mvsx/-maltivec/, s/-O3/-O2/.

Fixed.
> 
> >  
> > -/* Test that the vec_cmpne builtin works as expected.  */
> > -
> > -#include "altivec.h"
> > -
> > -#define N 4096
> > -
> > -void abort ();
> > -
> > -#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> > -\
> > -RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
> > -RTYPE result_eq_##NAME[N] __attribute__((aligned(16))); \
> > -STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
> > -STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
> > -RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
> > -\
> > -__attribute__((noinline)) void vector_tests_##NAME () \
> > -{ \
> > -  vector STYPE v1_##NAME, v2_##NAME; \
> > -  vector bool VBTYPE tmp_##NAME; \
> > -  int i; \
> > -  for (i = 0; i < N; i+=16/sizeof (STYPE)) \
> > -    { \
> > -      /* result_ne = operand1!=operand2.  */ \
> > -      v1_##NAME = vec_vsx_ld (0, (const vector
> > STYPE*)&operand1_##NAME[i]); \
> > -      v2_##NAME = vec_vsx_ld (0, (const vector
> > STYPE*)&operand2_##NAME[i]); \
> > -\
> > -      tmp_##NAME = vec_cmpeq (v1_##NAME, v2_##NAME); \
> > -      vec_vsx_st (tmp_##NAME, 0, &result_eq_##NAME[i]); \
> > -\
> > -      tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
> > -      vec_vsx_st (tmp_##NAME, 0, &result_ne_##NAME[i]); \
> > -    } \
> > -} \
> > -\
> > -__attribute__((noinline)) void init_##NAME () \
> > -{ \
> > -  int i; \
> > -  for (i = 0; i < N; ++i) \
> > -    { \
> > -      result_ne_##NAME[i] = 7; \
> > -      result_eq_##NAME[i] = 15; \
> > -      if (i%3 == 0) \
> > -   { \
> > -     /* op1 < op2.  */ \
> > -     operand1_##NAME[i] = 1; \
> > -     operand2_##NAME[i] = 2; \
> > -   } \
> > -      else if (i%3 == 1) \
> > -   { \
> > -     /* op1 > op2.  */ \
> > -     operand1_##NAME[i] = 2; \
> > -     operand2_##NAME[i] = 1; \
> > -   } \
> > -      else if (i%3 == 2) \
> > -   { \
> > -     /* op1 == op2.  */ \
> > -     operand1_##NAME[i] = 3; \
> > -     operand2_##NAME[i] = 3; \
> > -   } \
> > -      /* For vector comparisons: "For each element of the
> > result_ne, the \
> > -     value of each bit is 1 if the corresponding elements of ARG1
> > and \
> > -     ARG2 are equal." {or whatever the comparison is} "Otherwise,
> > the \
> > -     value of each bit is 0."  */ \
> > -    expected_##NAME[i] = -1 * (RTYPE)(operand1_##NAME[i] !=
> > operand2_##NAME[i]); \
> > -  } \
> > -} \
> > -\
> > -__attribute__((noinline)) void verify_results_##NAME () \
> > -{ \
> > -  int i; \
> > -  for (i = 0; i < N; ++i) \
> > -    { \
> > -      if ( ((result_ne_##NAME[i] != expected_##NAME[i]) || \
> > -       (result_ne_##NAME[i] == result_eq_##NAME[i]))) \
> > -   abort (); \
> > -    } \
> > -}
> > -
> > -
> > -#define execute_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> > -{ \
> > -  init_##NAME (); \
> > -  vector_tests_##NAME (); \
> > -  verify_results_##NAME (); \
> > -}
> > +/* Test that the vec_cmpne builtin generates the expected Altive
> > +   instructions.  */
> >  
> > +#include "vec-cmpne.h"
> >  
> >  define_test_functions (int, signed int, signed int, si);
> >  define_test_functions (int, unsigned int, unsigned int, ui);
> > @@ -98,17 +15,6 @@ define_test_functions (char, signed char, signed
> > char, sc);
> >  define_test_functions (char, unsigned char, unsigned char, uc);
> >  define_test_functions (int, signed int, float, ff);
> >  
> > -int main ()
> > -{
> > -  execute_test_functions (int, signed int, signed int, si);
> > -  execute_test_functions (int, unsigned int, unsigned int, ui);
> > -  execute_test_functions (short, signed short, signed short, ss);
> > -  execute_test_functions (short, unsigned short, unsigned short,
> > us);
> > -  execute_test_functions (char, signed char, signed char, sc);
> > -  execute_test_functions (char, unsigned char, unsigned char, uc);
> > -  execute_test_functions (int, signed int, float, ff);
> > -
> > -  return 0;
> > -}
> > -
> > -
> > +/* { dg-final { scan-assembler-times {\mvcmpequb\M}  4 } } */
> > +/* { dg-final { scan-assembler-times {\mvcmpequh\M}  4 } } */
> > +/* { dg-final { scan-assembler-times {\mvcmpequw\M}  4 } } */
> 
> Nit: There are two (signed and unsigned) for each int, short and
> char,
> but we get four for each, it's due to the related loop is unrolled
> by two, I think we can disable unrolling for this to make it robust
> with something such as:
> 
> --- vec-cmpne.h.orig        2023-08-02 22:31:05.150798846 -0500
> +++ vec-cmpne.h 2023-08-02 22:29:44.921109862 -0500
> @@ -4,6 +4,9 @@
> 
>  void abort ();
> 
> +#define PRAGMA(X) _Pragma (#X)
> +#define UNROLL0 PRAGMA (GCC unroll 0)
> +
>  #define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
>  \
>  RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
> @@ -17,6 +20,7 @@
>    vector STYPE v1_##NAME, v2_##NAME; \
>    vector bool VBTYPE tmp_##NAME; \
>    int i; \
> +  UNROLL0 \
>    for (i = 0; i < N; i+=16/sizeof (STYPE)) \
>      { \
>        /* result_ne = operand1!=operand2.  */ \
> 
> ----------------
> 
> Then the above scan-assembler-times should be all with 2.

Added the no unroll, updated the expected counts.

                  Carl

Re: [PATCH v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

Reply via email to