Richard Biener <richard.guent...@gmail.com> writes:
> On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>>
>> On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>> >
>> > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> > >
>> > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>> > > >>
>> > > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
>> > > >> <richard.guent...@gmail.com> wrote:
>> > > >> >
>> > > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubiz...@gmail.com> 
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
>> > > >> > > <richard.guent...@gmail.com> wrote:
>> > > >> > > >
>> > > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
>> > > >> > > > <gcc-patches@gcc.gnu.org> wrote:
>> > > >> > > > >
>> > > >> > > > > As shown in the PR, simplify_gen_subreg call in 
>> > > >> > > > > simplify_replace_fn_rtx:
>> > > >> > > > >
>> > > >> > > > > (gdb) list
>> > > >> > > > > 469           if (code == SUBREG)
>> > > >> > > > > 470             {
>> > > >> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG 
>> > > >> > > > > (x),
>> > > >> > > > > old_rtx, fn, data);
>> > > >> > > > > 472               if (op0 == SUBREG_REG (x))
>> > > >> > > > > 473                 return x;
>> > > >> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), 
>> > > >> > > > > op0,
>> > > >> > > > > 475                                          GET_MODE 
>> > > >> > > > > (SUBREG_REG (x)),
>> > > >> > > > > 476                                          SUBREG_BYTE (x));
>> > > >> > > > > 477               return op0 ? op0 : x;
>> > > >> > > > > 478             }
>> > > >> > > > >
>> > > >> > > > > simplifies with following arguments:
>> > > >> > > > >
>> > > >> > > > > (gdb) p debug_rtx (op0)
>> > > >> > > > > (const_vector:V4QI [
>> > > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
>> > > >> > > > >     ])
>> > > >> > > > > (gdb) p debug_rtx (x)
>> > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
>> > > >> > > > >
>> > > >> > > > > to:
>> > > >> > > > >
>> > > >> > > > > (gdb) p debug_rtx (op0)
>> > > >> > > > > (const_vector:V16QI [
>> > > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
>> > > >> > > > >     ])
>> > > >> > > > >
>> > > >> > > > > This simplification is invalid, it is not possible to get 
>> > > >> > > > > V16QImode vector
>> > > >> > > > > from V4QImode vector, even when all elements are duplicates.
>> > > >> >
>> > > >> > ^^^
>> > > >> >
>> > > >> > I think this simplification is valid.  A simplification to
>> > > >> >
>> > > >> > (const_vector:V16QI [
>> > > >> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
>> > > >> >          (const_int 0 [0]) repeated x12
>> > > >> >      ])
>> > > >> >
>> > > >> > would be valid as well.
>> > > >> >
>> > > >> > > > > The simplification happens in 
>> > > >> > > > > simplify_context::simplify_subreg:
>> > > >> > > > >
>> > > >> > > > > (gdb) list
>> > > >> > > > > 7558          if (VECTOR_MODE_P (outermode)
>> > > >> > > > > 7559              && GET_MODE_INNER (outermode) == 
>> > > >> > > > > GET_MODE_INNER (innermode)
>> > > >> > > > > 7560              && vec_duplicate_p (op, &elt))
>> > > >> > > > > 7561            return gen_vec_duplicate (outermode, elt);
>> > > >> > > > >
>> > > >> > > > > but the above simplification is valid only for 
>> > > >> > > > > non-paradoxical registers,
>> > > >> > > > > where outermode <= innermode.  We should not assume that 
>> > > >> > > > > elements outside
>> > > >> > > > > the original register are valid, let alone all duplicates.
>> > > >> > > >
>> > > >> > > > Hmm, but looking at the audit trail the x86 backend expects 
>> > > >> > > > them to be zero?
>> > > >> > > > Isn't that wrong as well?
>> > > >> > >
>> > > >> > > If you mean Comment #10, it is just an observation that
>> > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
>> > > >> > >
>> > > >> > > (gdb) p debug_rtx (src)
>> > > >> > > (const_vector:V8HI [
>> > > >> > >         (const_int 204 [0xcc]) repeated x4
>> > > >> > >         (const_int 0 [0]) repeated x4
>> > > >> > >     ])
>> > > >> > >
>> > > >> > > instead of:
>> > > >> > >
>> > > >> > > (gdb) p debug_rtx (src)
>> > > >> > > (const_vector:V8HI [
>> > > >> > >         (const_int 204 [0xcc]) repeated x8
>> > > >> > >     ])
>> > > >> > >
>> > > >> > > which is in line with the statement below.
>> > > >> > > >
>> > > >> > > > That is, I think putting any random value into the upper lanes 
>> > > >> > > > when
>> > > >> > > > constant folding
>> > > >> > > > a paradoxical subreg sounds OK to me, no?
>> > > >> > >
>> > > >> > > The compiler is putting zero there as can be seen from the above 
>> > > >> > > new RTX.
>> > > >> > >
>> > > >> > > > Of course we might choose to not do such constant propagation 
>> > > >> > > > for
>> > > >> > > > efficiency reason - at least
>> > > >> > > > when the resulting CONST_* would require a larger constant pool 
>> > > >> > > > entry
>> > > >> > > > or more costly
>> > > >> > > > construction.
>> > > >> > >
>> > > >> > > This is probably a follow-up improvement, where this patch tries 
>> > > >> > > to
>> > > >> > > fix a specific invalid simplification of simplify_replace_rtx 
>> > > >> > > that is
>> > > >> > > invalid universally.
>> > > >> >
>> > > >> > How so?  What specifies the values of the paradoxical subreg for the
>> > > >> > bytes not covered by the subreg operand?
>> > > >>
>> > > >> I don't know why 0 is generated here (and if it is valid) for
>> > > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
>> > > >> the wrong constant and triggers unwanted propagation later on.
>> > > >
>> > > > Quoting what I wrote in the PR below.  I think pragmatically the fix is
>> > > > good - we might miss some opportunistic folding this way but we for
>> > > > sure may not optimistically register an equality via REG_EQUAL without
>> > > > enforcing it (removing the producer and replacing it with the 
>> > > > optimistic
>> > > > constant).
>> > > >
>> > > > So consider the patch approved if no other RTL maintainer chimes in
>> > > > within 48h.
>> > >
>> > > Sorry, can you hold off a bit longer?  Wanted to have a look but the
>> > > deadline is about to expire.
>> >
>> > No problem, I will wait for you.
>>
>> Please also note Comment #14 in the PR. With the patch, the compiler sets
>>
>>  (const_vector:V8HI [
>>          (const_int 204 [0xcc]) repeated x4
>>          (const_int 0 [0]) repeated x4
>>      ])
>>
>> as a new REG_EQUAL note. This also does not look OK to me. IMO, the
>> compiler should not emit REG_EQUAL note when some of the elements are
>> derived from undefined values outside the original register.
>
> Yes.  The reverse would also be true, so when we'd actually constant propagate
> (which would be OK) we also may not put an REG_EQUAL note with the original
> expression which produced the undefined values.

Yeah.  Good, it sounds like we're all in agreement. :)  For:

(insn 22 21 23 4 (set (reg:V8HI 100)
        (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
                (parallel [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                        (const_int 4 [0x4])
                        (const_int 5 [0x5])
                        (const_int 6 [0x6])
                        (const_int 7 [0x7])
                    ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
     (expr_list:REG_EQUAL (const_vector:V8HI [
                (const_int 204 [0xcc]) repeated x8
            ])
        (expr_list:REG_DEAD (reg:V4QI 98)
            (nil))))

folding to:

  (set (reg:V8HI 100)
       (const_vector:V8HI [(const_int 204 [0xcc]) ...]))

would be OK, but adding the REG_EQUAL is not.

> Within the current framework it looks difficult to guarantee either so the 
> patch
> removing the possibility of simplifying alltogether looks most reasonable to 
> me.

But like Uros says, the alternative REG_EQUAL is equally wrong
in principle.  It just happens to work out OK for this testcase.

fwprop.c has:

      /* If there are any paradoxical SUBREGs, drop the REG_EQUAL note,
         because the bits in there can be anything and so might not
         match the REG_EQUAL note content.  See PR70574.  */
      if (contains_paradoxical_subreg_p (SET_SRC (set)))
        return false;

expand_mult_const has a similar guard.  IMO the bug is that cprop is
lacking the same test.

Thanks,
Richard

Reply via email to