On Fri, Jul 14, 2023 at 10:53 AM Richard Biener <rguent...@suse.de> wrote: > > On Fri, 14 Jul 2023, Uros Bizjak wrote: > > > On Fri, Jul 14, 2023 at 10:31?AM Richard Biener <rguent...@suse.de> wrote: > > > > > > On Fri, 14 Jul 2023, Uros Bizjak wrote: > > > > > > > cprop1 pass does not consider paradoxical subreg and for (insn 22) > > > > claims > > > > that it equals 8 elements of HImodeby setting REG_EQUAL note: > > > > > > > > (insn 21 19 22 4 (set (reg:V4QI 98) > > > > (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0 S4 > > > > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal} > > > > (expr_list:REG_EQUAL (const_vector:V4QI [ > > > > (const_int -52 [0xffffffffffffffcc]) repeated x4 > > > > ]) > > > > (nil))) > > > > (insn 22 21 23 4 (set (reg:V8HI 100) > > > > (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) > > > > 0) > > > > (parallel [ > > > > (const_int 0 [0]) > > > > (const_int 1 [0x1]) > > > > (const_int 2 [0x2]) > > > > (const_int 3 [0x3]) > > > > (const_int 4 [0x4]) > > > > (const_int 5 [0x5]) > > > > (const_int 6 [0x6]) > > > > (const_int 7 [0x7]) > > > > ])))) "pr110206.c":12:42 7471 > > > > {sse4_1_zero_extendv8qiv8hi2} > > > > (expr_list:REG_EQUAL (const_vector:V8HI [ > > > > (const_int 204 [0xcc]) repeated x8 > > > > ]) > > > > (expr_list:REG_DEAD (reg:V4QI 98) > > > > (nil)))) > > > > > > > > We rely on the "undefined" vals to have a specific value (from the > > > > earlier > > > > REG_EQUAL note) but actual code generation doesn't ensure this (it > > > > doesn't > > > > need to). That said, the issue isn't the constant folding per-se but > > > > that > > > > we do not actually constant fold but register an equality that doesn't > > > > hold. > > > > > > > > PR target/110206 > > > > > > > > gcc/ChangeLog: > > > > > > > > * fwprop.cc (contains_paradoxical_subreg_p): Move to ... > > > > * rtlanal.cc (contains_paradoxical_subreg_p): ... here. > > > > * rtlanal.h (contains_paradoxical_subreg_p): Add prototype. > > > > * cprop.cc (try_replace_reg): Do not set REG_EQUAL note > > > > when the original source contains a paradoxical subreg. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/torture/pr110206.c: New test. > > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > > > > > OK for mainline and backports? > > > > > > OK. > > > > > > I think the testcase can also run on other targets if you add > > > dg-additional-options "-w -Wno-psabi", all generic vector ops > > > should be lowered if not supported. > > > > True, but with lowered vector ops, the test would not even come close > > to the problem. The problem is specific to generic vector ops, and can > > be triggered only when paradoxical subregs are used to implement > > (partial) vector modes. This is the case on x86, where partial vectors > > are now heavily used, and even there we need the latest vector ISA > > enabled to trip the condition. > > > > The above is the reason that dg-torture is used, with the hope that > > the runtime failure will trip when testsuite is run with specific > > target options. > > I see. I'm fine with this then though moving to gcc.target/i386 > with appropriate triggering options and a dg-require for runtime > support would also work.
You are right. I'll add the attached testcase to gcc.target/i386 instead. Uros.
/* PR target/110206 */ /* { dg-do run } */ /* { dg-options "-Os -mavx512bw -mavx512vl" } */ /* { dg-require-effective-target avx512bw } */ /* { dg-require-effective-target avx512vl } */ #define AVX512BW #define AVX512VL #include "avx512f-check.h" typedef unsigned char __attribute__((__vector_size__ (4))) U; typedef unsigned char __attribute__((__vector_size__ (8))) V; typedef unsigned short u16; V g; void __attribute__((noinline)) foo (U u, u16 c, V *r) { if (!c) abort (); V x = __builtin_shufflevector (u, (204 >> u), 7, 0, 5, 1, 3, 5, 0, 2); V y = __builtin_shufflevector (g, (V) { }, 7, 6, 6, 7, 2, 6, 3, 5); V z = __builtin_shufflevector (y, 204 * x, 3, 9, 8, 1, 4, 6, 14, 5); *r = z; } static void test_256 (void) { }; static void test_128 (void) { V r; foo ((U){4}, 5, &r); if (r[6] != 0x30) abort(); }