https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105395
Bug ID: 105395 Summary: Invalid reload of atomic operation Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* There is an element of “doctor it hurts if I do this” here, but: if the result of an atomic operation becomes a hard register before reload, and if that hard register conflicts with the allocated address of the atomic operation, we can end up reloading the whole atomic memory instead of the address. I saw this “in the wild” with some later changes, because a combine opportunity on the input triggered a three-way combine opportunity with a following hard-register move. But it could happen in other cases too. Brute-force test case that doesn't rely on RA: int __RTL (startwith ("vregs")) f1 (int *ptr, int val) { (function "f1" (param "ptr" (DECL_RTL (reg/v:DI <1> [ ptr ])) (DECL_RTL_INCOMING (reg:DI x0 [ ptr ])) ) (param "val" (DECL_RTL (reg/v:SI <2> [ val ])) (DECL_RTL_INCOMING (reg:SI x1 [ val ])) ) (insn-chain (block 2 (edge-from entry (flags "FALLTHRU")) (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) (cnote 2 NOTE_INSN_FUNCTION_BEG) (cinsn 3 (parallel [(set (reg:SI x0) (mem/v:SI (reg:DI x0) [-1 S4 A32])) (set (mem/v:SI (reg:DI x0) [-1 S4 A32]) (unspec_volatile:SI [(plus:SI (mem/v:SI (reg:DI x0) [-1 S4 A32]) (reg:SI <2>)) (const_int 32773)] UNSPECV_ATOMIC_OP)) (clobber (reg:CC cc)) (clobber (scratch:SI)) (clobber (scratch:SI))] ) ) (cinsn 4 (use (reg:SI x0))) (edge-to exit (flags "FALLTHRU")) ) ) (crtl (return_rtx (reg:SI x0))) ) } which with -O2 gives: f1: sub sp, sp, #16 ldr w0, [x0] add x4, sp, 12 mov w1, 0 str w0, [sp, 12] .L3: ldxr w0, [x4] add w2, w0, w1 stlxr w3, w2, [x4] cbnz w3, .L3 dmb ish ldr w1, [sp, 12] str w1, [x0] add sp, sp, 16 ret I think the problem is that lra-constraints.c only sets offmemok if the memory doesn't “win” (i.e. if the memory doesn't satisfy the constraints in its original form). The memory here is OK in its original form. Later on the function detects the conflict and marks the memory as no longer winning: if (HARD_REGISTER_P (operand_reg[i]) || (first_conflict_j == last_conflict_j && operand_reg[last_conflict_j] != NULL_RTX && !curr_alt_match_win[last_conflict_j] && !HARD_REGISTER_P (operand_reg[last_conflict_j]))) { curr_alt_win[last_conflict_j] = false; But because offmemok is false, this leads to a reload of the full memory rather than its address.