On Fri, Jul 14, 2023 at 10:53 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Fri, 14 Jul 2023, Uros Bizjak wrote:
>
> > On Fri, Jul 14, 2023 at 10:31?AM Richard Biener <rguent...@suse.de> wrote:
> > >
> > > On Fri, 14 Jul 2023, Uros Bizjak wrote:
> > >
> > > > cprop1 pass does not consider paradoxical subreg and for (insn 22) 
> > > > claims
> > > > that it equals 8 elements of HImodeby setting REG_EQUAL note:
> > > >
> > > > (insn 21 19 22 4 (set (reg:V4QI 98)
> > > >         (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> > > > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
> > > >      (expr_list:REG_EQUAL (const_vector:V4QI [
> > > >                 (const_int -52 [0xffffffffffffffcc]) repeated x4
> > > >             ])
> > > >         (nil)))
> > > > (insn 22 21 23 4 (set (reg:V8HI 100)
> > > >         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 
> > > > 0)
> > > >                 (parallel [
> > > >                         (const_int 0 [0])
> > > >                         (const_int 1 [0x1])
> > > >                         (const_int 2 [0x2])
> > > >                         (const_int 3 [0x3])
> > > >                         (const_int 4 [0x4])
> > > >                         (const_int 5 [0x5])
> > > >                         (const_int 6 [0x6])
> > > >                         (const_int 7 [0x7])
> > > >                     ])))) "pr110206.c":12:42 7471 
> > > > {sse4_1_zero_extendv8qiv8hi2}
> > > >      (expr_list:REG_EQUAL (const_vector:V8HI [
> > > >                 (const_int 204 [0xcc]) repeated x8
> > > >             ])
> > > >         (expr_list:REG_DEAD (reg:V4QI 98)
> > > >             (nil))))
> > > >
> > > > We rely on the "undefined" vals to have a specific value (from the 
> > > > earlier
> > > > REG_EQUAL note) but actual code generation doesn't ensure this (it 
> > > > doesn't
> > > > need to).  That said, the issue isn't the constant folding per-se but 
> > > > that
> > > > we do not actually constant fold but register an equality that doesn't 
> > > > hold.
> > > >
> > > >     PR target/110206
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >     * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> > > >     * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> > > >     * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> > > >     * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> > > >     when the original source contains a paradoxical subreg.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >     * gcc.dg/torture/pr110206.c: New test.
> > > >
> > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > > >
> > > > OK for mainline and backports?
> > >
> > > OK.
> > >
> > > I think the testcase can also run on other targets if you add
> > > dg-additional-options "-w -Wno-psabi", all generic vector ops
> > > should be lowered if not supported.
> >
> > True, but with lowered vector ops, the test would not even come close
> > to the problem. The problem is specific to generic vector ops, and can
> > be triggered only when paradoxical subregs are used to implement
> > (partial) vector modes. This is the case on x86, where partial vectors
> > are now heavily used, and even there we need the latest vector ISA
> > enabled to trip the condition.
> >
> > The above is the reason that dg-torture is used, with the hope that
> > the runtime failure will trip when testsuite is run with specific
> > target options.
>
> I see.  I'm fine with this then though moving to gcc.target/i386
> with appropriate triggering options and a dg-require for runtime
> support would also work.

You are right. I'll add the attached testcase to gcc.target/i386 instead.

Uros.
/* PR target/110206 */
/* { dg-do run } */
/* { dg-options "-Os -mavx512bw -mavx512vl" } */
/* { dg-require-effective-target avx512bw } */
/* { dg-require-effective-target avx512vl } */

#define AVX512BW
#define AVX512VL

#include "avx512f-check.h"

typedef unsigned char __attribute__((__vector_size__ (4))) U;
typedef unsigned char __attribute__((__vector_size__ (8))) V;
typedef unsigned short u16;

V g;

void
__attribute__((noinline))
foo (U u, u16 c, V *r)
{
  if (!c)
    abort ();
  V x = __builtin_shufflevector (u, (204 >> u), 7, 0, 5, 1, 3, 5, 0, 2);
  V y = __builtin_shufflevector (g, (V) { }, 7, 6, 6, 7, 2, 6, 3, 5);
  V z = __builtin_shufflevector (y, 204 * x, 3, 9, 8, 1, 4, 6, 14, 5);
  *r = z;
}

static void test_256 (void) { };

static void
test_128 (void)
{
  V r;
  foo ((U){4}, 5, &r);
  if (r[6] != 0x30)
    abort();
}

Reply via email to