https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116028

Surya Kumari Jangala <jskumari at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64-*-*

--- Comment #2 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
Testcase pr10474.c:

void f(int *i)
{
        if (!i)
                return;
        else
        {
                __builtin_printf("Hi");
                *i=0;
        }
}

----------
On aarch64:

Assembly w/o patch at r15-1619-g3b9b8d6cfdf593:
        cbz     x0, .L7
        stp     x29, x30, [sp, -32]!
        mov     x29, sp
        str     x19, [sp, 16]
        mov     x19, x0
        adrp    x0, .LC0
        add     x0, x0, :lo12:.LC0
        bl      printf
        str     wzr, [x19]
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 32
        ret
.L7:
        ret

-----------

Assembly w/ patch:
        stp     x29, x30, [sp, -32]!
        mov     x29, sp
        str     x0, [sp, 24]
        cbz     x0, .L1
        adrp    x0, .LC0
        add     x0, x0, :lo12:.LC0
        bl      printf
        ldr     x1, [sp, 24]
        str     wzr, [x1]
.L1:
        ldp     x29, x30, [sp], 32
        ret


As we can see above, w/o patch the test case gets shrink wrapped.

Input RTL to the LRA pass (the RTL is same both w/ and w/o patch):

BB2:
  set r95, x0
  set r92, r95
  if (r92 eq 0) jump BB4
BB3:
  set x0, symbol-ref("Hi")
  x0 = call printf
  set mem(r92), 0
BB4:
  ret


Register assignment by IRA:
w/o patch:
  r92-->x19
  r95-->x0
  r94-->x0

w/ patch:
  r92-->x1
  r95-->x0
  r94-->x0


RTL after LRA:

w/o patch:
BB2:
  set x19, x0
  if (x19 eq 0) jump BB4
BB3:
  set x0, symbol-ref("Hi")
  x0 = call printf
  set mem(x19), 0
BB4:
  ret


w/ patch:
BB2:
  set x1, x0
  set mem(sp+24), x1
  if (x1 eq 0) jump BB4
BB3:
  set x0, symbol-ref("Hi")
  x0 = call printf
  set x1, mem(sp+24)
  set mem(x1), 0
BB4:
  ret


The difference between w/o patch and w/ patch is that w/o patch, a callee-save
register (x19) is chosen to hold the value of x0 (input parameter register).
While
w/ patch, a caller-save register (x1) is chosen.

W/o patch, during the shrink wrap pass, first copy propagation is done and
the 'if' insn in BB2 is changed as follows:
  set x19, x0
  if (x19 eq 0) jump BB4

changed to:
  set x19, x0
  if (x0 eq 0) jump BB4   

Next, the insn "set x19, x0" is moved down the cfg to BB3. Since x19 is a
callee-save register, prolog gets generated in BB3 thereby resulting in
successful shrink wrapping.

W/ patch, during the shrink wrap pass, copy propagation changes BB2 as follows:
  set x1, x0
  set mem(sp+24), x1
  if (x1 eq 0) jump BB4

changed to:
  set x1, x0
  set mem(sp+24), x0
  if (x0 eq 0) jump BB4

However the store insn (set mem[sp+24], x0) cannot be moved down to BB3.
hence prolog gets generated in BB2 itself due to the use of 'sp'. Thereby
shrink wrap fails.

The store insn (which basically saves x1 to stack) is generated by the
LRA pass. This insn is needed because x1 is a caller-save register and we
have a call insn that will clobber this register. However, the store insn is
generated
in the entry BB (BB2) instead of in BB3 which has the call insn. If the store
is generated in BB3, then the testcase will be shrink wrapped successfully.
In fact, it is more efficient if the store occurs only in the path containing
the printf call instead of occurring in the entry bb.

The reason why LRA generates the store insn in the entry bb is as follows:
LRA emits insns to save caller-save registers in the inheritance/splitting
pass.
In this pass, LRA builds EBBs (Extended Basic Block) and traverses the insns in
the EBBs in reverse order from the last insn to the first insn. When LRA sees a
write to a pseudo (that has been assigned a caller-save register), and there is
a
read following the write, with an intervening call insn between the write and
read,
then LRA generates a spill immediately after the write and a restore
immediately
before the read. The spill is needed because the call insn will clobber the
caller-save register.

In the above testcase, LRA forms two EBBs: the first EBB contains BB2 & BB3
while
the second EBB contains BB4. 

In BB2, there is a write to x1 in the insn : 
set r92, r95 //r92 is assigned x1 and r95 is assigned x0

In BB3, there is a read of x1 after the call
insn.
set mem(r92), 0   // r92 is assigned x1

So LRA generates a spill in BB2 after the write to x1.

The fix to this issue would involve making changes in
LRA to save caller-save registers before a call instead of after the write to
the
caller-save register.

Reply via email to