LRA for avr: Clear mem rtx when resizing stack slots?

SenthilKumar.Selvaraj--- via Gcc Fri, 01 Sep 2023 03:55:38 -0700

Hi,

  The below (slightly) reduced test-case from gcc.c-torture/execute/simd-2.c
  produces an execution failure for avr.


typedef short __attribute__((vector_size (16))) vecint;

vecint i = { 150, 100, 150, 200, 0, 0, 0, 0 };
vecint j = { 10, 13, 20, 30, 1, 1, 1, 1 };
vecint k;

union {
  vecint v;
  short i[8];
} res;

void
verify (int a1, int a2, int a3, int a4,
        int b1, int b2, int b3, int b4)
{
  if (a1 != b1
      || a2 != b2
      || a3 != b3
      || a4 != b4)
    abort ();
}

int
main ()
{
  k = i + j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 160, 113, 170, 230);

  k = i * j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 1500, 1300, 3000, 6000);

  k = i / j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 15, 7, 7, 6);

  k = i & j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 2, 4, 20, 8);

  k = i | j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 158, 109, 150, 222);

  k = i ^ j;
  res.v = k;

  verify (res.i[0], res.i[1], res.i[2], res.i[3], 156, 105, 130, 214);

  k = -i;
  res.v = k;
  verify (res.i[0], res.i[1], res.i[2], res.i[3],
          -150, -100, -150, -200);

  exit (0);
}

  To reproduce, enable LRA in avr.c by removing TARGET_LRA_P hook, build with
  make all-host && make install-host, and compile

$ avr-gcc -mmcu=avr51 
~/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/simd-2.c -Os  -S -o 
out.s -fdump-rtl-all-raw

  The pass before dse2 has the below instructions to store the lowest two bytes
  of i (i.e 150) at stack offsets 23 and 21.

(insn 2403 2402 2404 2 (parallel [
            (set (reg:QI 24 r24 [orig:102 _131 ] [102])
                (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("i") [flags 0x2]  
<var_decl 0x7ff45a778090 i>)
                            (const_int 1 [0x1]))) [1 i+1 S1 A8]))
            (clobber (reg:CC 36 cc))
        ]) 89 {movqi_insn}
     (nil))
(insn 2404 2403 2405 2 (parallel [
            (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
                        (const_int 23 [0x17])) [4 %sfp+23 S1 A8])
                (reg:QI 24 r24 [orig:102 _131 ] [102]))
            (clobber (reg:CC 36 cc))
        ]) 89 {movqi_insn}
     (nil))
(insn 2405 2404 2406 2 (parallel [
            (set (reg:QI 24 r24 [orig:99 _128 ] [99])
                (mem/c:QI (symbol_ref:HI ("i") [flags 0x2]  <var_decl 
0x7ff45a778090 i>) [1 i+0 S1 A128]))
            (clobber (reg:CC 36 cc))
        ]) 89 {movqi_insn}
     (nil))
(insn 2406 2405 2407 2 (parallel [
            (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
                        (const_int 21 [0x15])) [4 %sfp+21 S1 A8])
                (reg:QI 24 r24 [orig:99 _128 ] [99]))
            (clobber (reg:CC 36 cc))
        ]) 89 {movqi_insn}
     (nil))

  dse2 deletes these insns, as it sees the below clobber before
  the values in the stack offsets are used.

(insn 448 2083 2084 15 (clobber (mem/c:TI (plus:HI (reg/f:HI 28 r28)
                (const_int 9 [0x9])) [4 %sfp+9 S16 A8])) -1
     (nil))
  
   DSE figures everything from stack offset 9 to 9 + 16 = 25 
   (16 is GET_MODE_SIZE (TImode)) is clobbered, so it deletes the dead stores.

   The reason for the dead stores is overlapping stack offsets for two spilled 
pseudos.

   IRA has
<snip>
(insn 34 32 36 2 (set (reg:QI 102 [ _131 ])
        (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("i") [flags 0x2]  
<var_decl 0x7ff45a778090 i>)
                    (const_int 1 [0x1]))) [1 i+1 S1 A8])) 86 {movqi_insn_split}
     (nil))
(insn 36 34 38 2 (set (reg:QI 99 [ _128 ])
        (mem/c:QI (symbol_ref:HI ("i") [flags 0x2]  <var_decl 0x7ff45a778090 
i>) [1 i+0 S1 A128])) 86 {movqi_insn_split}
     (nil))
<snip>
(insn 448 430 432 15 (clobber (reg:TI 147 [ _176 ])) -1
     (nil))

   and LRA spills 102 and 99 to distinct stack slots, like so

Slot 10 regnos (width = 0):      99
Slot 12 regnos (width = 0):      102

   However, when spilling 147,

Breakpoint 1, add_pseudo_to_slot (regno=147, slot_num=4) at 
/home/i41766/code/personal/gcc/gcc/lra-spills.cc:331
331                                              
lra_reg_info[regno].biggest_mode);
(gdb) p slots[slot_num]
$1 = {regno = 51, hard_regno = -1, align = 8, size = {<poly_int_pod<1, long>> = 
{coeffs = {2}}, <No data fields>}, 
  mem = 0x7fffea5eb1c8, live_ranges = 0x33b29c0}
(gdb) p debug_rtx(slots[slot_num].mem)
(mem/c:BLK (plus:HI (reg/f:HI 28 r28)
        (const_int 9 [0x9])) [0  A8])

  it reuses slot 4, originally 2 bytes wide and with stack offset 9. Live 
ranges for 
  r51 and r147 do not conflict, so this ok, and it resizes the stack slot to be 
as
  wide as TImode. However, it does not reset slots[slot_num].mem, and 
assign_mem_slot
  when called for r147 sets the same stack offset rtx as pseudo_slots[147].mem.

  This appears wrong to me, as the bigger slot may conflict with stack offsets
  assigned to other slots. The below patch resets slots[slot_num].mem if the
  resized slot is bigger than before. assign_mem_slot later gets a new stack 
offset
  from function.cc:assign_stack_local and sets it as the slot's MEM rtx.

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index a663a1931e3..82d5fbef66a 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -331,9 +331,14 @@ add_pseudo_to_slot (int regno, int slot_num)
                                         lra_reg_info[regno].biggest_mode);
   unsigned int align = spill_slot_alignment (mode);
   slots[slot_num].align = MAX (slots[slot_num].align, align);
-  slots[slot_num].size = upper_bound (slots[slot_num].size,
+
+  poly_int64 origsize = slots[slot_num].size;
+  slots[slot_num].size = upper_bound (origsize,
                                      GET_MODE_SIZE (mode));
 
+  if (maybe_gt(slots[slot_num].size, origsize))
+    slots[slot_num].mem = NULL_RTX;
+
   if (slots[slot_num].regno < 0)
     {
       /* It is the first pseudo in the slot.  */

  After the patch, the slot for r147 gets a different non-overlapping
  stack offset, and DSE no longer removes the storing of the lowest 2
  bytes of i.

(insn 448 430 1436 43 (clobber (mem/c:TI (plus:HI (reg/f:HI 28 r28)
                (const_int 55 [0x37])) [4 %sfp+55 S16 A8])) -1
     (nil))

  However, I'm not sure what should be done for the other pseudo(s)
  sharing the slot (r51 in this example). Their pseudo_slot.mem will
  still be the original stack offset rtx. It appears to work ok for this
  test case, but I'm not sure if that is right thing to do.

  Is my understanding correct, and is there a better way to fix this problem?

Regards
Senthil

LRA for avr: Clear mem rtx when resizing stack slots?

Reply via email to