On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> >
> > (gdb) list
> > 469           if (code == SUBREG)
> > 470             {
> > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > old_rtx, fn, data);
> > 472               if (op0 == SUBREG_REG (x))
> > 473                 return x;
> > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > 475                                          GET_MODE (SUBREG_REG (x)),
> > 476                                          SUBREG_BYTE (x));
> > 477               return op0 ? op0 : x;
> > 478             }
> >
> > simplifies with following arguments:
> >
> > (gdb) p debug_rtx (op0)
> > (const_vector:V4QI [
> >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> >     ])
> > (gdb) p debug_rtx (x)
> > (subreg:V16QI (reg:V4QI 98) 0)
> >
> > to:
> >
> > (gdb) p debug_rtx (op0)
> > (const_vector:V16QI [
> >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> >     ])
> >
> > This simplification is invalid, it is not possible to get V16QImode vector
> > from V4QImode vector, even when all elements are duplicates.
> >
> > The simplification happens in simplify_context::simplify_subreg:
> >
> > (gdb) list
> > 7558          if (VECTOR_MODE_P (outermode)
> > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER 
> > (innermode)
> > 7560              && vec_duplicate_p (op, &elt))
> > 7561            return gen_vec_duplicate (outermode, elt);
> >
> > but the above simplification is valid only for non-paradoxical registers,
> > where outermode <= innermode.  We should not assume that elements outside
> > the original register are valid, let alone all duplicates.
>
> Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> Isn't that wrong as well?

If you mean Comment #10, it is just an observation that
simplify_replace_rtx simplifies arguments from Comment #9 to:

(gdb) p debug_rtx (src)
(const_vector:V8HI [
        (const_int 204 [0xcc]) repeated x4
        (const_int 0 [0]) repeated x4
    ])

instead of:

(gdb) p debug_rtx (src)
(const_vector:V8HI [
        (const_int 204 [0xcc]) repeated x8
    ])

which is in line with the statement below.
>
> That is, I think putting any random value into the upper lanes when
> constant folding
> a paradoxical subreg sounds OK to me, no?

The compiler is putting zero there as can be seen from the above new RTX.

> Of course we might choose to not do such constant propagation for
> efficiency reason - at least
> when the resulting CONST_* would require a larger constant pool entry
> or more costly
> construction.

This is probably a follow-up improvement, where this patch tries to
fix a specific invalid simplification of simplify_replace_rtx that is
invalid universally.

Uros.

Reply via email to