On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx: > > > > > > (gdb) list > > > 469 if (code == SUBREG) > > > 470 { > > > 471 op0 = simplify_replace_fn_rtx (SUBREG_REG (x), > > > old_rtx, fn, data); > > > 472 if (op0 == SUBREG_REG (x)) > > > 473 return x; > > > 474 op0 = simplify_gen_subreg (GET_MODE (x), op0, > > > 475 GET_MODE (SUBREG_REG (x)), > > > 476 SUBREG_BYTE (x)); > > > 477 return op0 ? op0 : x; > > > 478 } > > > > > > simplifies with following arguments: > > > > > > (gdb) p debug_rtx (op0) > > > (const_vector:V4QI [ > > > (const_int -52 [0xffffffffffffffcc]) repeated x4 > > > ]) > > > (gdb) p debug_rtx (x) > > > (subreg:V16QI (reg:V4QI 98) 0) > > > > > > to: > > > > > > (gdb) p debug_rtx (op0) > > > (const_vector:V16QI [ > > > (const_int -52 [0xffffffffffffffcc]) repeated x16 > > > ]) > > > > > > This simplification is invalid, it is not possible to get V16QImode vector > > > from V4QImode vector, even when all elements are duplicates.
^^^ I think this simplification is valid. A simplification to (const_vector:V16QI [ (const_int -52 [0xffffffffffffffcc]) repeated x4 (const_int 0 [0]) repeated x12 ]) would be valid as well. > > > The simplification happens in simplify_context::simplify_subreg: > > > > > > (gdb) list > > > 7558 if (VECTOR_MODE_P (outermode) > > > 7559 && GET_MODE_INNER (outermode) == GET_MODE_INNER > > > (innermode) > > > 7560 && vec_duplicate_p (op, &elt)) > > > 7561 return gen_vec_duplicate (outermode, elt); > > > > > > but the above simplification is valid only for non-paradoxical registers, > > > where outermode <= innermode. We should not assume that elements outside > > > the original register are valid, let alone all duplicates. > > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero? > > Isn't that wrong as well? > > If you mean Comment #10, it is just an observation that > simplify_replace_rtx simplifies arguments from Comment #9 to: > > (gdb) p debug_rtx (src) > (const_vector:V8HI [ > (const_int 204 [0xcc]) repeated x4 > (const_int 0 [0]) repeated x4 > ]) > > instead of: > > (gdb) p debug_rtx (src) > (const_vector:V8HI [ > (const_int 204 [0xcc]) repeated x8 > ]) > > which is in line with the statement below. > > > > That is, I think putting any random value into the upper lanes when > > constant folding > > a paradoxical subreg sounds OK to me, no? > > The compiler is putting zero there as can be seen from the above new RTX. > > > Of course we might choose to not do such constant propagation for > > efficiency reason - at least > > when the resulting CONST_* would require a larger constant pool entry > > or more costly > > construction. > > This is probably a follow-up improvement, where this patch tries to > fix a specific invalid simplification of simplify_replace_rtx that is > invalid universally. How so? What specifies the values of the paradoxical subreg for the bytes not covered by the subreg operand? > > Uros.