> This doesn't look right, you need scan-tree-dump-times with tree dumps.

Oh, yes, interesting the make check doesn't complain anything about it.

Summaries valid vectorize cases as blow, is there anything missing?

X86 uint8_t uint16_t
X86_32 uint8_t, uint16_t, uint32_t
X86_64 uint8_t, uint16_t, uint32_t, uint64_t

BTW, how can I tell the x86, x86_32, x86_64 from the test source code?

Pan

-----Original Message-----
From: Uros Bizjak <ubiz...@gmail.com> 
Sent: Monday, November 25, 2024 4:40 PM
To: Li, Pan2 <pan2...@intel.com>
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao....@intel.com>
Subject: Re: [PATCH v2] I386: Add more testcases for unsigned SAT_ADD vector 
pattern

On Mon, Nov 25, 2024 at 3:27 AM <pan2...@intel.com> wrote:
>
> From: Pan Li <pan2...@intel.com>
>
> Update in v2:
>
> * Skip lto build as no such dump files.
> * scan dump check for optimized.
>
> Original log:
>
> There are some forms like below failed to recog the SAT_ADD
> pattern for target i386.  It is related to some match pattern
> extraction but get fixed after the refactor of the SAT_ADD
> pattern.  Thus, add testcases to ensure we may have similar
> issue in futrue.
>
>   #define DEF_SAT_ADD(T)   \
>   T sat_add_##T (T x, T y) \
>   {                        \
>     T res;                 \
>     res = x + y;           \
>     res |= -(T)(res < x);  \
>     return res;            \
>   }
>
>   #define VEC_DEF_SAT_ADD(T)                       \
>   void vec_sat_add(T * restrict a, T * restrict b) \
>   {                                                \
>     for (int i = 0; i < 8; i++)                    \
>       b[i] = sat_add_##T (a[i], b[i]);             \
>   }
>
>   DEF_SAT_ADD (uint32_t)
>   VEC_DEF_SAT_ADD (uint32_t)
>
> The below test suites are passed for this patch.
> * The x86 fully regression test.
>
>         PR target/112600
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr112600-5a-u16.c: New test.
>         * gcc.target/i386/pr112600-5a-u32.c: New test.
>         * gcc.target/i386/pr112600-5a-u64.c: New test.
>         * gcc.target/i386/pr112600-5a-u8.c: New test.
>         * gcc.target/i386/pr112600-5a.h: New test.
>
> Signed-off-by: Pan Li <pan2...@intel.com>
> ---
>  .../gcc.target/i386/pr112600-5a-u16.c         | 10 +++++++++
>  .../gcc.target/i386/pr112600-5a-u32.c         | 10 +++++++++
>  .../gcc.target/i386/pr112600-5a-u64.c         | 10 +++++++++
>  .../gcc.target/i386/pr112600-5a-u8.c          | 10 +++++++++
>  gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++++++++++++++++++
>  5 files changed, 62 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> new file mode 100644
> index 00000000000..5f314d6b46a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint16_t)
> +VEC_DEF_SAT_ADD (uint16_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { 
> no-opts { "-ffat-lto-objects" } } } } } */

This doesn't look right, you need scan-tree-dump-times with tree dumps.

> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> new file mode 100644
> index 00000000000..229a27c4c20
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint32_t)
> +VEC_DEF_SAT_ADD (uint32_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { 
> no-opts { "-ffat-lto-objects" } } } } } */

Please note that x86 can't vectorize uint32_t ...

> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> new file mode 100644
> index 00000000000..2c9e4d09fe0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint64_t)
> +VEC_DEF_SAT_ADD (uint64_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { 
> no-opts { "-ffat-lto-objects" } } } } } */

... and can't vectorize uint64_t, because it has no 32bit and 64bit
vector SAT_ADD instruction.

Other than that, x86_32 can't optimize scalar uint64_t.

Please also add -msse2 to compile flags so x86_32 will be able to
vectorize uint8_t and uint16_t.

You can test both targets in the testsuite run by using:

make -k check-gcc RUNTESTFLAGS="--target_board=unix\{,-m32\}
i386.exp=pr112600-5a-*.c"

Uros.

Reply via email to