Hi Carl, on 2023/5/11 02:06, Carl Love via Gcc-patches wrote: > GCC maintainers: > > The following patch fixes errors in the arguments in the > __builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx builtin > definitions. Note, these builtins are used by the overloaded > __builtin_vec_xst_trunc builtin. > > The patch adds a new overloaded builtin definition for > __builtin_vec_xst_trunc for the third argument to be unsigned and > signed long int. > > A new testcase is added for the various overloaded versions of > __builtin_vec_xst_trunc. > > The patch has been tested on Power 10 with no new regressions. > > Please let me know if the patch is acceptable for mainline. Thanks. > > Carl > > ------------------------------------------- > rs6000: Fix __builtin_vec_xst_trunc definition > > Built-in __builtin_vec_xst_trunc calls __builtin_altivec_tr_stxvrhx > and __builtin_altivec_tr_stxvrwx to handle the short and word cases. The > arguments for these two builtins are wrong. This patch fixes the wrong > arguments for the builtins. > > Additionally, the patch adds a new __builtin_vec_xst_trunc overloaded > version for the destination being signed or unsigned long int. > > A runnable test case is added to test each of the overloaded definitions > of __builtin_vec_xst_tru > > gcc/ > * config/rs6000/builtins.def (__builtin_altivec_tr_stxvrhx, > __builtin_altivec_tr_stxvrwx): Fix type of second argument. > Add, definition for send argument to be signed long. > * config/rs6000/rs6000-overload.def (__builtin_vec_xst_trunc): > add definition with thrird arument signed and unsigned long. > * doc/extend.texi (__builtin_vec_xst_trunc): Add documentation for > new unsinged long and signed long versions. > > gcc/testsuite/ > * gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c: New test case > for __builtin_vec_xst_trunc builtin. > --- > gcc/config/rs6000/rs6000-builtins.def | 7 +- > gcc/config/rs6000/rs6000-overload.def | 4 + > gcc/doc/extend.texi | 2 + > .../powerpc/vsx-builtin-vec_xst_trunc.c | 217 ++++++++++++++++++ > 4 files changed, 228 insertions(+), 2 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 638d0bc72ca..a378491b358 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -3161,12 +3161,15 @@ > void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *); > TR_STXVRBX vsx_stxvrbx {stvec} > > - void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *); > + void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed short *); > TR_STXVRHX vsx_stxvrhx {stvec} > > - void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *); > + void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int *); > TR_STXVRWX vsx_stxvrwx {stvec}
Good catching! > > + void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long *); > + TR_STXVRLX vsx_stxvrdx {stvec} > + This is mapped to the one used for type long long, it's a hard mapping, IMHO it's wrong and not consistent with what the users expect, since on Power the size of type long int is 4 bytes at -m32 while 8 bytes at -m64, this implementation binding to 8 bytes can cause trouble in 32-bit. I wonder if it's a good idea to add one overloaded version for type long int, for now openxl also emits error message for long int type pointer (see its doc [1]), users can use casting to make it to the acceptable pointer types (long long or int as its size). [1] https://www.ibm.com/docs/en/openxl-c-and-cpp-lop/17.1.1?topic=functions-vec-xst-trunc > void __builtin_altivec_tr_stxvrdx (vsq, signed long, signed long long *); > TR_STXVRDX vsx_stxvrdx {stvec} > > diff --git a/gcc/config/rs6000/rs6000-overload.def > b/gcc/config/rs6000/rs6000-overload.def > index c582490c084..54b7ae5e51b 100644 > --- a/gcc/config/rs6000/rs6000-overload.def > +++ b/gcc/config/rs6000/rs6000-overload.def > @@ -4872,6 +4872,10 @@ > TR_STXVRWX TR_STXVRWX_S > void __builtin_vec_xst_trunc (vuq, signed long long, unsigned int *); > TR_STXVRWX TR_STXVRWX_U > + void __builtin_vec_xst_trunc (vsq, signed long long, signed long *); > + TR_STXVRLX TR_STXVRLX_S > + void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long *); > + TR_STXVRLX TR_STXVRLX_U > void __builtin_vec_xst_trunc (vsq, signed long long, signed long long *); > TR_STXVRDX TR_STXVRDX_S > void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long long *); > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index e426a2eb7d8..7e2ae790ab3 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -18570,10 +18570,12 @@ instructions. > @defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, > signed char *)} > @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, > signed short *)} > @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, > signed int *)} > +@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, > signed long *)} > @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, > signed long long *)} > @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long > long, unsigned char *)} > @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long > long, unsigned short *)} > @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long > long, unsigned int *)} > +@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long > long, unsigned long *)} > @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long > long, unsigned long long *)} > > Truncate and store the rightmost element of a vector, as if implemented by > the > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c > b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c > new file mode 100644 > index 00000000000..7108109560d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c > @@ -0,0 +1,217 @@ > +/* Test of __builtin_vec_xst_trunc */ > + > +/* { dg-do run { target power10_hw } } */ > +/* { dg-require-effective-target power10_ok } */ Normally _hw is stricter than _ok, you already check power10_hw, I think power10_ok is useless here. > +/* { dg-require-effective-target int128 } */ > +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ > + > +#include <altivec.h> > +#include <stdio.h> > +#include <inttypes.h> > +#include <string.h> > +#include <stdlib.h> > + > +#define DEBUG 0 > +#define TRUE 1 > +#define FALSE 0 > +#define SIZE 4 > + > +vector signed __int128 zero_vsint128 = {0x0}; > + > +vector signed __int128 store_data[SIZE] = { > +{ (__int128) 0x79BD000000000000 << 64 | (__int128) 0x123456789abcdef8ULL}, > +{ (__int128) 0x8ACE000000000000 << 64 | (__int128) 0xfedcba9876543217ULL}, > +{ (__int128) 0x1357000000000000 << 64 | (__int128) 0xccccccccccccccccULL}, > +{ (__int128) 0xf000000000000000 << 64 | (__int128) 0xaaaaaaaaaaaaaaaaULL} > +}; > + > +signed char signed_char_expected[SIZE] = {0xF8ULL, 0x17, 0xCC, 0xAA}; > +signed short signed_short_expected[SIZE] = {0xDEF8, 0x3217, 0xcccc, 0xaaaa, > }; > +signed int signed_int_expected[SIZE] = {0x9ABCDEF8, 0x76543217, 0xCCCCCCCC, > + 0xAAAAAAAA}; > +signed long int signed_long_expected[SIZE] = {0x123456789ABCDEF8, > + 0xFEDCBA9876543217ULL, > + 0xCCCCCCCCCCCCCCCCULL, > + 0xAAAAAAAAAAAAAAAAULL}; > +signed long long int signed_long_long_expected[SIZE] = > {0x123456789ABCDEF8ULL, > + 0xFEDCBA9876543217ULL, > + 0xCCCCCCCCCCCCCCCCULL, > + 0xAAAAAAAAAAAAAAAAULL}; > + > +union conv_t { > + vector signed __int128 vsi128; > + unsigned long long ull[2]; > + signed char schar[16]; > + signed __int128 s128; > +} conv; > + > +int check_expected_byte (signed char expected, > + signed char actual) > +{ > + /* Return TRUE if expected and actual values all match. */ > + if (expected != actual) > + { > +#if DEBUG > + printf ("ERROR: Expected half values don't match. \n"); > + printf (" Expected 0x%x & 0xFFFF, actual 0x%x & 0xFFFF\n", > + expected & 0xFF, actual & 0xFF); > +#endif > + return FALSE; > + } > + return TRUE; > +} > + > +int check_expected_half (signed short int expected, > + signed short int actual) > +{ > + /* Return TRUE if expected and actual values all match. */ > + if (expected != actual) > + { > +#if DEBUG > + printf ("ERROR: Expected short values don't match. \n"); > + printf (" Expected 0x%x, actual 0x%x\n", > + expected & 0xFFFF, actual & 0xFFFF); > +#endif > + return FALSE; > + } > + return TRUE; > +} > + > +int check_expected_int (signed int expected, > + signed int actual) > +{ > + /* Return TRUE if expected and actual values all match. */ > + if (expected != actual) > + { > +#if DEBUG > + printf ("ERROR: Expected int values don't match. \n"); > + printf (" Expected 0x%x, actual 0x%x\n", > + expected, actual); > +#endif > + return FALSE; > + } > + return TRUE; > +} > + > +int check_expected_long (signed long int expected, > + signed long int actual) > +{ > + /* Return TRUE if expected and actual values all match. */ > + if (expected != actual) > + { > +#if DEBUG > + printf ("ERROR: Expected long values don't match. \n"); > + printf (" Expected 0x%x, actual 0x%x\n", > + expected, actual); > +#endif > + return FALSE; > + } > + return TRUE; > +} > + > +int check_expected_long_long (signed long long int expected, > + signed long long int actual) > +{ > + /* Return TRUE if expected and actual values all match. */ > + if (expected != actual) > + { > +#if DEBUG > + printf ("ERROR: Expected long long values don't match. \n"); > + printf (" Expected 0x%x, actual 0x%x\n", > + expected, actual); > +#endif > + return FALSE; > + } > + return TRUE; > +} > + > +void print_store_data (vector signed __int128 *store_data, int size) > +{ > +#if DEBUG > + union conv_t val; > + int i; > + > + for (i = 0; i < size; i++) > + { > + val.vsi128 = store_data[i]; > + printf("Data to store [%d] = 0x%llx %llx\n", i, val.ull[1], > val.ull[0]); > + } > +#endif > +} > + > + > +void print_raw_buffer (vector signed __int128 *rawbuffer, int size) > +{ > +#if DEBUG > + union conv_t val; > + int i; > + > + for (i = 0; i < size; i++) > + { > + val.vsi128 = rawbuffer[i]; > + printf ("rawbuffer[%d] = 0x%llx %llx\n", i, val.ull[1], val.ull[0]); > + } > +#endif > +} > + > +int > +main () { > + int i; > + > + vector signed __int128 rawbuffer[SIZE]; > + signed char * vsbuffer_char = (signed char *)rawbuffer; > + signed short int * vsbuffer_short = (signed short int *)rawbuffer; > + signed int * vsbuffer_int = (signed int *)rawbuffer; > + signed long int * vsbuffer_long = (signed long *)rawbuffer; > + signed long long int * vsbuffer_long_long = (signed long long *)rawbuffer; > + > + for (i = 0; i < SIZE; i++) > + rawbuffer[i] = zero_vsint128; > + > + print_store_data (store_data, SIZE); > + > + for (i = 0; i < SIZE; i++) > + { > + __builtin_vec_xst_trunc (store_data[i], i*sizeof(char), > + vsbuffer_char); > + check_expected_byte (signed_char_expected[i], vsbuffer_char[i]); > + } > + > + for (i = 0; i < SIZE; i++) > + { > + __builtin_vec_xst_trunc (store_data[i], i*sizeof(short int), > + vsbuffer_short); > + check_expected_half (signed_short_expected[i], vsbuffer_short[i]); > + } > + > + for (i = 0; i < SIZE; i++) > + { > + __builtin_vec_xst_trunc (store_data[i], i*sizeof(int), > + vsbuffer_int); > + check_expected_int (signed_int_expected[i], vsbuffer_int[i]); > + } > + > + for (i = 0; i < SIZE; i++) > + { > + __builtin_vec_xst_trunc (store_data[i], i*sizeof(long int), > + vsbuffer_long); > + check_expected_long (signed_long_long_expected[i], > + vsbuffer_long[i]); > + } > + > + for (i = 0; i < SIZE; i++) > + { > + __builtin_vec_xst_trunc (store_data[i], i*sizeof(long long int), > + vsbuffer_long_long); > + check_expected_long_long (signed_long_long_expected[i], > + vsbuffer_long_long[i]); > + } > + > + print_raw_buffer (rawbuffer, SIZE); > + return 0; > +} > + > +/* { dg-final { scan-assembler {\mstxvrbx\M} } } */ > +/* { dg-final { scan-assembler {\mstxvrhx\M} } } */ > +/* { dg-final { scan-assembler {\mstxvrwx\M} } } */ > +/* { dg-final { scan-assembler {\mstxvrdx\M} } } */ Could you check the exact times instead? BR, Kewen