https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105395

            Bug ID: 105395
           Summary: Invalid reload of atomic operation
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*-*-*

There is an element of “doctor it hurts if I do this” here, but:
if the result of an atomic operation becomes a hard register
before reload, and if that hard register conflicts with the
allocated address of the atomic operation, we can end up reloading
the whole atomic memory instead of the address.  I saw this
“in the wild” with some later changes, because a combine opportunity
on the input triggered a three-way combine opportunity with
a following hard-register move.  But it could happen in other
cases too.

Brute-force test case that doesn't rely on RA:

int __RTL (startwith ("vregs")) f1 (int *ptr, int val)
{
(function "f1"
  (param "ptr"
    (DECL_RTL (reg/v:DI <1> [ ptr ]))
    (DECL_RTL_INCOMING (reg:DI x0 [ ptr ]))
  )
  (param "val"
    (DECL_RTL (reg/v:SI <2> [ val ]))
    (DECL_RTL_INCOMING (reg:SI x1 [ val ]))
  )
  (insn-chain
    (block 2
      (edge-from entry (flags "FALLTHRU"))
      (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
      (cnote 2 NOTE_INSN_FUNCTION_BEG)
      (cinsn 3
        (parallel
          [(set (reg:SI x0) (mem/v:SI (reg:DI x0) [-1 S4 A32]))
           (set (mem/v:SI (reg:DI x0) [-1 S4 A32])
                (unspec_volatile:SI
                  [(plus:SI (mem/v:SI (reg:DI x0) [-1 S4 A32])
                            (reg:SI <2>))
                   (const_int 32773)]
                  UNSPECV_ATOMIC_OP))
           (clobber (reg:CC cc))
           (clobber (scratch:SI))
           (clobber (scratch:SI))]
        )
      )
      (cinsn 4 (use (reg:SI x0)))
      (edge-to exit (flags "FALLTHRU"))
    )
  )
  (crtl (return_rtx (reg:SI x0)))
)
}

which with -O2 gives:

f1:
        sub     sp, sp, #16
        ldr     w0, [x0]
        add     x4, sp, 12
        mov     w1, 0
        str     w0, [sp, 12]
.L3:
        ldxr    w0, [x4]
        add     w2, w0, w1
        stlxr   w3, w2, [x4]
        cbnz    w3, .L3
        dmb     ish
        ldr     w1, [sp, 12]
        str     w1, [x0]
        add     sp, sp, 16
        ret

I think the problem is that lra-constraints.c only sets offmemok
if the memory doesn't “win” (i.e. if the memory doesn't satisfy
the constraints in its original form).  The memory here is OK
in its original form.  Later on the function detects the conflict
and marks the memory as no longer winning:

          if (HARD_REGISTER_P (operand_reg[i])
              || (first_conflict_j == last_conflict_j
                  && operand_reg[last_conflict_j] != NULL_RTX
                  && !curr_alt_match_win[last_conflict_j]
                  && !HARD_REGISTER_P (operand_reg[last_conflict_j])))
            {
              curr_alt_win[last_conflict_j] = false;

But because offmemok is false, this leads to a reload of the full memory
rather than its address.

Reply via email to