> This doesn't look right, you need scan-tree-dump-times with tree dumps.
Oh, yes, interesting the make check doesn't complain anything about it. Summaries valid vectorize cases as blow, is there anything missing? X86 uint8_t uint16_t X86_32 uint8_t, uint16_t, uint32_t X86_64 uint8_t, uint16_t, uint32_t, uint64_t BTW, how can I tell the x86, x86_32, x86_64 from the test source code? Pan -----Original Message----- From: Uros Bizjak <ubiz...@gmail.com> Sent: Monday, November 25, 2024 4:40 PM To: Li, Pan2 <pan2...@intel.com> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao....@intel.com> Subject: Re: [PATCH v2] I386: Add more testcases for unsigned SAT_ADD vector pattern On Mon, Nov 25, 2024 at 3:27 AM <pan2...@intel.com> wrote: > > From: Pan Li <pan2...@intel.com> > > Update in v2: > > * Skip lto build as no such dump files. > * scan dump check for optimized. > > Original log: > > There are some forms like below failed to recog the SAT_ADD > pattern for target i386. It is related to some match pattern > extraction but get fixed after the refactor of the SAT_ADD > pattern. Thus, add testcases to ensure we may have similar > issue in futrue. > > #define DEF_SAT_ADD(T) \ > T sat_add_##T (T x, T y) \ > { \ > T res; \ > res = x + y; \ > res |= -(T)(res < x); \ > return res; \ > } > > #define VEC_DEF_SAT_ADD(T) \ > void vec_sat_add(T * restrict a, T * restrict b) \ > { \ > for (int i = 0; i < 8; i++) \ > b[i] = sat_add_##T (a[i], b[i]); \ > } > > DEF_SAT_ADD (uint32_t) > VEC_DEF_SAT_ADD (uint32_t) > > The below test suites are passed for this patch. > * The x86 fully regression test. > > PR target/112600 > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr112600-5a-u16.c: New test. > * gcc.target/i386/pr112600-5a-u32.c: New test. > * gcc.target/i386/pr112600-5a-u64.c: New test. > * gcc.target/i386/pr112600-5a-u8.c: New test. > * gcc.target/i386/pr112600-5a.h: New test. > > Signed-off-by: Pan Li <pan2...@intel.com> > --- > .../gcc.target/i386/pr112600-5a-u16.c | 10 +++++++++ > .../gcc.target/i386/pr112600-5a-u32.c | 10 +++++++++ > .../gcc.target/i386/pr112600-5a-u64.c | 10 +++++++++ > .../gcc.target/i386/pr112600-5a-u8.c | 10 +++++++++ > gcc/testsuite/gcc.target/i386/pr112600-5a.h | 22 +++++++++++++++++++ > 5 files changed, 62 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h > > diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c > b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c > new file mode 100644 > index 00000000000..5f314d6b46a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c > @@ -0,0 +1,10 @@ > +/* PR target/112600 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +#include "pr112600-5a.h" > + > +DEF_SAT_ADD (uint16_t) > +VEC_DEF_SAT_ADD (uint16_t) > + > +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { > no-opts { "-ffat-lto-objects" } } } } } */ This doesn't look right, you need scan-tree-dump-times with tree dumps. > diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c > b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c > new file mode 100644 > index 00000000000..229a27c4c20 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c > @@ -0,0 +1,10 @@ > +/* PR target/112600 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +#include "pr112600-5a.h" > + > +DEF_SAT_ADD (uint32_t) > +VEC_DEF_SAT_ADD (uint32_t) > + > +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { > no-opts { "-ffat-lto-objects" } } } } } */ Please note that x86 can't vectorize uint32_t ... > diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c > b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c > new file mode 100644 > index 00000000000..2c9e4d09fe0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c > @@ -0,0 +1,10 @@ > +/* PR target/112600 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +#include "pr112600-5a.h" > + > +DEF_SAT_ADD (uint64_t) > +VEC_DEF_SAT_ADD (uint64_t) > + > +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { > no-opts { "-ffat-lto-objects" } } } } } */ ... and can't vectorize uint64_t, because it has no 32bit and 64bit vector SAT_ADD instruction. Other than that, x86_32 can't optimize scalar uint64_t. Please also add -msse2 to compile flags so x86_32 will be able to vectorize uint8_t and uint16_t. You can test both targets in the testsuite run by using: make -k check-gcc RUNTESTFLAGS="--target_board=unix\{,-m32\} i386.exp=pr112600-5a-*.c" Uros.