Richard Biener <richard.guent...@gmail.com> writes: > On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak <ubiz...@gmail.com> wrote: >> >> On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak <ubiz...@gmail.com> wrote: >> > >> > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> > > >> > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubiz...@gmail.com> wrote: >> > > >> >> > > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener >> > > >> <richard.guent...@gmail.com> wrote: >> > > >> > >> > > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubiz...@gmail.com> >> > > >> > wrote: >> > > >> > > >> > > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener >> > > >> > > <richard.guent...@gmail.com> wrote: >> > > >> > > > >> > > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches >> > > >> > > > <gcc-patches@gcc.gnu.org> wrote: >> > > >> > > > > >> > > >> > > > > As shown in the PR, simplify_gen_subreg call in >> > > >> > > > > simplify_replace_fn_rtx: >> > > >> > > > > >> > > >> > > > > (gdb) list >> > > >> > > > > 469 if (code == SUBREG) >> > > >> > > > > 470 { >> > > >> > > > > 471 op0 = simplify_replace_fn_rtx (SUBREG_REG >> > > >> > > > > (x), >> > > >> > > > > old_rtx, fn, data); >> > > >> > > > > 472 if (op0 == SUBREG_REG (x)) >> > > >> > > > > 473 return x; >> > > >> > > > > 474 op0 = simplify_gen_subreg (GET_MODE (x), >> > > >> > > > > op0, >> > > >> > > > > 475 GET_MODE >> > > >> > > > > (SUBREG_REG (x)), >> > > >> > > > > 476 SUBREG_BYTE (x)); >> > > >> > > > > 477 return op0 ? op0 : x; >> > > >> > > > > 478 } >> > > >> > > > > >> > > >> > > > > simplifies with following arguments: >> > > >> > > > > >> > > >> > > > > (gdb) p debug_rtx (op0) >> > > >> > > > > (const_vector:V4QI [ >> > > >> > > > > (const_int -52 [0xffffffffffffffcc]) repeated x4 >> > > >> > > > > ]) >> > > >> > > > > (gdb) p debug_rtx (x) >> > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0) >> > > >> > > > > >> > > >> > > > > to: >> > > >> > > > > >> > > >> > > > > (gdb) p debug_rtx (op0) >> > > >> > > > > (const_vector:V16QI [ >> > > >> > > > > (const_int -52 [0xffffffffffffffcc]) repeated x16 >> > > >> > > > > ]) >> > > >> > > > > >> > > >> > > > > This simplification is invalid, it is not possible to get >> > > >> > > > > V16QImode vector >> > > >> > > > > from V4QImode vector, even when all elements are duplicates. >> > > >> > >> > > >> > ^^^ >> > > >> > >> > > >> > I think this simplification is valid. A simplification to >> > > >> > >> > > >> > (const_vector:V16QI [ >> > > >> > (const_int -52 [0xffffffffffffffcc]) repeated x4 >> > > >> > (const_int 0 [0]) repeated x12 >> > > >> > ]) >> > > >> > >> > > >> > would be valid as well. >> > > >> > >> > > >> > > > > The simplification happens in >> > > >> > > > > simplify_context::simplify_subreg: >> > > >> > > > > >> > > >> > > > > (gdb) list >> > > >> > > > > 7558 if (VECTOR_MODE_P (outermode) >> > > >> > > > > 7559 && GET_MODE_INNER (outermode) == >> > > >> > > > > GET_MODE_INNER (innermode) >> > > >> > > > > 7560 && vec_duplicate_p (op, &elt)) >> > > >> > > > > 7561 return gen_vec_duplicate (outermode, elt); >> > > >> > > > > >> > > >> > > > > but the above simplification is valid only for >> > > >> > > > > non-paradoxical registers, >> > > >> > > > > where outermode <= innermode. We should not assume that >> > > >> > > > > elements outside >> > > >> > > > > the original register are valid, let alone all duplicates. >> > > >> > > > >> > > >> > > > Hmm, but looking at the audit trail the x86 backend expects >> > > >> > > > them to be zero? >> > > >> > > > Isn't that wrong as well? >> > > >> > > >> > > >> > > If you mean Comment #10, it is just an observation that >> > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to: >> > > >> > > >> > > >> > > (gdb) p debug_rtx (src) >> > > >> > > (const_vector:V8HI [ >> > > >> > > (const_int 204 [0xcc]) repeated x4 >> > > >> > > (const_int 0 [0]) repeated x4 >> > > >> > > ]) >> > > >> > > >> > > >> > > instead of: >> > > >> > > >> > > >> > > (gdb) p debug_rtx (src) >> > > >> > > (const_vector:V8HI [ >> > > >> > > (const_int 204 [0xcc]) repeated x8 >> > > >> > > ]) >> > > >> > > >> > > >> > > which is in line with the statement below. >> > > >> > > > >> > > >> > > > That is, I think putting any random value into the upper lanes >> > > >> > > > when >> > > >> > > > constant folding >> > > >> > > > a paradoxical subreg sounds OK to me, no? >> > > >> > > >> > > >> > > The compiler is putting zero there as can be seen from the above >> > > >> > > new RTX. >> > > >> > > >> > > >> > > > Of course we might choose to not do such constant propagation >> > > >> > > > for >> > > >> > > > efficiency reason - at least >> > > >> > > > when the resulting CONST_* would require a larger constant pool >> > > >> > > > entry >> > > >> > > > or more costly >> > > >> > > > construction. >> > > >> > > >> > > >> > > This is probably a follow-up improvement, where this patch tries >> > > >> > > to >> > > >> > > fix a specific invalid simplification of simplify_replace_rtx >> > > >> > > that is >> > > >> > > invalid universally. >> > > >> > >> > > >> > How so? What specifies the values of the paradoxical subreg for the >> > > >> > bytes not covered by the subreg operand? >> > > >> >> > > >> I don't know why 0 is generated here (and if it is valid) for >> > > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to >> > > >> the wrong constant and triggers unwanted propagation later on. >> > > > >> > > > Quoting what I wrote in the PR below. I think pragmatically the fix is >> > > > good - we might miss some opportunistic folding this way but we for >> > > > sure may not optimistically register an equality via REG_EQUAL without >> > > > enforcing it (removing the producer and replacing it with the >> > > > optimistic >> > > > constant). >> > > > >> > > > So consider the patch approved if no other RTL maintainer chimes in >> > > > within 48h. >> > > >> > > Sorry, can you hold off a bit longer? Wanted to have a look but the >> > > deadline is about to expire. >> > >> > No problem, I will wait for you. >> >> Please also note Comment #14 in the PR. With the patch, the compiler sets >> >> (const_vector:V8HI [ >> (const_int 204 [0xcc]) repeated x4 >> (const_int 0 [0]) repeated x4 >> ]) >> >> as a new REG_EQUAL note. This also does not look OK to me. IMO, the >> compiler should not emit REG_EQUAL note when some of the elements are >> derived from undefined values outside the original register. > > Yes. The reverse would also be true, so when we'd actually constant propagate > (which would be OK) we also may not put an REG_EQUAL note with the original > expression which produced the undefined values.
Yeah. Good, it sounds like we're all in agreement. :) For: (insn 22 21 23 4 (set (reg:V8HI 100) (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2} (expr_list:REG_EQUAL (const_vector:V8HI [ (const_int 204 [0xcc]) repeated x8 ]) (expr_list:REG_DEAD (reg:V4QI 98) (nil)))) folding to: (set (reg:V8HI 100) (const_vector:V8HI [(const_int 204 [0xcc]) ...])) would be OK, but adding the REG_EQUAL is not. > Within the current framework it looks difficult to guarantee either so the > patch > removing the possibility of simplifying alltogether looks most reasonable to > me. But like Uros says, the alternative REG_EQUAL is equally wrong in principle. It just happens to work out OK for this testcase. fwprop.c has: /* If there are any paradoxical SUBREGs, drop the REG_EQUAL note, because the bits in there can be anything and so might not match the REG_EQUAL note content. See PR70574. */ if (contains_paradoxical_subreg_p (SET_SRC (set))) return false; expand_mult_const has a similar guard. IMO the bug is that cprop is lacking the same test. Thanks, Richard