On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > > > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > >> > > > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener > > > >> <richard.guent...@gmail.com> wrote: > > > >> > > > > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubiz...@gmail.com> > > > >> > wrote: > > > >> > > > > > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener > > > >> > > <richard.guent...@gmail.com> wrote: > > > >> > > > > > > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches > > > >> > > > <gcc-patches@gcc.gnu.org> wrote: > > > >> > > > > > > > >> > > > > As shown in the PR, simplify_gen_subreg call in > > > >> > > > > simplify_replace_fn_rtx: > > > >> > > > > > > > >> > > > > (gdb) list > > > >> > > > > 469 if (code == SUBREG) > > > >> > > > > 470 { > > > >> > > > > 471 op0 = simplify_replace_fn_rtx (SUBREG_REG > > > >> > > > > (x), > > > >> > > > > old_rtx, fn, data); > > > >> > > > > 472 if (op0 == SUBREG_REG (x)) > > > >> > > > > 473 return x; > > > >> > > > > 474 op0 = simplify_gen_subreg (GET_MODE (x), op0, > > > >> > > > > 475 GET_MODE > > > >> > > > > (SUBREG_REG (x)), > > > >> > > > > 476 SUBREG_BYTE (x)); > > > >> > > > > 477 return op0 ? op0 : x; > > > >> > > > > 478 } > > > >> > > > > > > > >> > > > > simplifies with following arguments: > > > >> > > > > > > > >> > > > > (gdb) p debug_rtx (op0) > > > >> > > > > (const_vector:V4QI [ > > > >> > > > > (const_int -52 [0xffffffffffffffcc]) repeated x4 > > > >> > > > > ]) > > > >> > > > > (gdb) p debug_rtx (x) > > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0) > > > >> > > > > > > > >> > > > > to: > > > >> > > > > > > > >> > > > > (gdb) p debug_rtx (op0) > > > >> > > > > (const_vector:V16QI [ > > > >> > > > > (const_int -52 [0xffffffffffffffcc]) repeated x16 > > > >> > > > > ]) > > > >> > > > > > > > >> > > > > This simplification is invalid, it is not possible to get > > > >> > > > > V16QImode vector > > > >> > > > > from V4QImode vector, even when all elements are duplicates. > > > >> > > > > >> > ^^^ > > > >> > > > > >> > I think this simplification is valid. A simplification to > > > >> > > > > >> > (const_vector:V16QI [ > > > >> > (const_int -52 [0xffffffffffffffcc]) repeated x4 > > > >> > (const_int 0 [0]) repeated x12 > > > >> > ]) > > > >> > > > > >> > would be valid as well. > > > >> > > > > >> > > > > The simplification happens in > > > >> > > > > simplify_context::simplify_subreg: > > > >> > > > > > > > >> > > > > (gdb) list > > > >> > > > > 7558 if (VECTOR_MODE_P (outermode) > > > >> > > > > 7559 && GET_MODE_INNER (outermode) == > > > >> > > > > GET_MODE_INNER (innermode) > > > >> > > > > 7560 && vec_duplicate_p (op, &elt)) > > > >> > > > > 7561 return gen_vec_duplicate (outermode, elt); > > > >> > > > > > > > >> > > > > but the above simplification is valid only for non-paradoxical > > > >> > > > > registers, > > > >> > > > > where outermode <= innermode. We should not assume that > > > >> > > > > elements outside > > > >> > > > > the original register are valid, let alone all duplicates. > > > >> > > > > > > >> > > > Hmm, but looking at the audit trail the x86 backend expects them > > > >> > > > to be zero? > > > >> > > > Isn't that wrong as well? > > > >> > > > > > >> > > If you mean Comment #10, it is just an observation that > > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to: > > > >> > > > > > >> > > (gdb) p debug_rtx (src) > > > >> > > (const_vector:V8HI [ > > > >> > > (const_int 204 [0xcc]) repeated x4 > > > >> > > (const_int 0 [0]) repeated x4 > > > >> > > ]) > > > >> > > > > > >> > > instead of: > > > >> > > > > > >> > > (gdb) p debug_rtx (src) > > > >> > > (const_vector:V8HI [ > > > >> > > (const_int 204 [0xcc]) repeated x8 > > > >> > > ]) > > > >> > > > > > >> > > which is in line with the statement below. > > > >> > > > > > > >> > > > That is, I think putting any random value into the upper lanes > > > >> > > > when > > > >> > > > constant folding > > > >> > > > a paradoxical subreg sounds OK to me, no? > > > >> > > > > > >> > > The compiler is putting zero there as can be seen from the above > > > >> > > new RTX. > > > >> > > > > > >> > > > Of course we might choose to not do such constant propagation for > > > >> > > > efficiency reason - at least > > > >> > > > when the resulting CONST_* would require a larger constant pool > > > >> > > > entry > > > >> > > > or more costly > > > >> > > > construction. > > > >> > > > > > >> > > This is probably a follow-up improvement, where this patch tries to > > > >> > > fix a specific invalid simplification of simplify_replace_rtx that > > > >> > > is > > > >> > > invalid universally. > > > >> > > > > >> > How so? What specifies the values of the paradoxical subreg for the > > > >> > bytes not covered by the subreg operand? > > > >> > > > >> I don't know why 0 is generated here (and if it is valid) for > > > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to > > > >> the wrong constant and triggers unwanted propagation later on. > > > > > > > > Quoting what I wrote in the PR below. I think pragmatically the fix is > > > > good - we might miss some opportunistic folding this way but we for > > > > sure may not optimistically register an equality via REG_EQUAL without > > > > enforcing it (removing the producer and replacing it with the optimistic > > > > constant). > > > > > > > > So consider the patch approved if no other RTL maintainer chimes in > > > > within 48h. > > > > > > Sorry, can you hold off a bit longer? Wanted to have a look but the > > > deadline is about to expire. > > > > No problem, I will wait for you. > > Please also note Comment #14 in the PR. With the patch, the compiler sets > > (const_vector:V8HI [ > (const_int 204 [0xcc]) repeated x4 > (const_int 0 [0]) repeated x4 > ]) > > as a new REG_EQUAL note. This also does not look OK to me. IMO, the > compiler should not emit REG_EQUAL note when some of the elements are > derived from undefined values outside the original register.
Yes. The reverse would also be true, so when we'd actually constant propagate (which would be OK) we also may not put an REG_EQUAL note with the original expression which produced the undefined values. Within the current framework it looks difficult to guarantee either so the patch removing the possibility of simplifying alltogether looks most reasonable to me. Richard. > > Uros.