Am 20.11.24 um 11:33 schrieb Richard Biener:
On Wed, Nov 20, 2024 at 11:29 AM Georg-Johann Lay via Gcc
<gcc@gcc.gnu.org> wrote:

Consider the following RTL peephole from avr.md:

(define_peephole2               ; avr.md:5387
    [(match_scratch:QI 3 "d")
     (parallel [(set (match_operand:ALL4 0 "register_operand" "")
                     (ashift:ALL4 (match_operand:ALL4 1
"register_operand" "")
                                  (match_operand:QI 2 "const_int_operand"
"")))
                (clobber (reg:CC REG_CC))])]
    ""
    [(parallel [(set (match_dup 0)
                     (ashift:ALL4 (match_dup 1)
                                  (match_dup 2)))
                (clobber (match_dup 3))
                (clobber (reg:CC REG_CC))])])

Finally, Andrew Pinski pointed out the correct solution:

In order for the scratch from the peephole2 to not overlap
with the destination of the matched insn, a top-level
match_dup is required according to

https://gcc.gnu.org/onlinedocs/gccint/define_005fpeephole2.html

So the fixed peephole2 reads:

(define_peephole2
    [(match_scratch:QI 3 "d")
     (parallel [(set (match_operand:ALL4 0 "register_operand")
                     (ashift:ALL4 (match_operand:ALL4 1 "register_operand")
                                  (match_operand:QI 2 "const_int_operand")))
                (clobber (reg:CC REG_CC))])
     (match_dup 3)]

This is required due to my changes which turned the shift insn from a
2-operand insn to a 3-operand one.  The 2-operand case was not an
issue because AFAIU peephole2 makes sure that the scratch won't
overlap with any input.


Johann


As far as I understand, its purpose is to provide a QImode
scratch register provided such a scratch is available.

However, in the .peephole2 RTL dump with -da I see the following:

Splitting with gen_peephole2_100 (avr.md:5387)
...
(insn 24 8 15 2 (parallel [
              (set (reg:SI 22 r22 [orig:47 _3 ] [47])
                   (ashift:SI (reg:SI 20 r20 [orig:48 x ] [48])
                              (const_int 7 [0x7])))
              (clobber (reg:QI 24 r24))
              (clobber (reg:CC 36 cc))
          ])
       (nil))

That is, the scratch r24:QI is overlapping the output in
r22:SI.  All hard registers are 8-bit regs and hence r22:SI
extends from r22...r25.

A scratch that overlaps the operands is pretty much useless
or even plain wrong.  recog.cc::peep2_find_free_register()
has this comment:  /* Don't use registers set or clobbered by the insn.  */

    from = peep2_buf_position (peep2_current + from);
    to = peep2_buf_position (peep2_current + to);

    gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
    REG_SET_TO_HARD_REG_SET (live, peep2_insn_data[from].live_before);

    while (from != to)
      {
        gcc_assert (peep2_insn_data[from].insn != NULL_RTX);

        /* Don't use registers set or clobbered by the insn.  */
        FOR_EACH_INSN_DEF (def, peep2_insn_data[from].insn)
         SET_HARD_REG_BIT (live, DF_REF_REGNO (def));

        from = peep2_buf_position (from + 1);
      }

So it this bogus in that it assumes all registers extend only
over one hard reg?

Yes, looks like a bug to me.


FYI, the purpose is to provide a scratch without increasing the register
pressure (which "match_scratch" would do).  Therefore, the RTL peephole
is used instead of forcing reload to come up with a scratch.

More specifically, I see this with

$ avr-gcc bogus-peep2.c -S -Os -da

long ashl32_7 (int i, long x)
{
      return x << 7;
}

with the attached WIP patch atop trunk b222ee10045d.

Johann

Target: avr
Configured with: ../../source/gcc-master/configure --target=avr
--disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared
   --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20241119 (experimental) (GCC)

Reply via email to