https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
Kewen Lin <linkw at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |linkw at gcc dot gnu.org
--- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #0)
> Created attachment 54202 [details]
> testcase
>
> At least the documentation should mention that if intentional.
>
> In the attached example, the function bar is compiled to
>
> bar:
> .localentry bar,1
> mtctr 3
> mr 12,3
> bctr
> .long 0
> .byte 0,0,0,0,0,0,0,0
>
> i.e. it does not preserve r2 (it's compiled with -mcpu=power10). If the
> caller is not compiled with -mcpu=power10, it needs r2 preserved (bar has a
> localentry, so the nop in the caller stays a nop after linking).
My local 64bit-elfv2-abi spec v1.5 has the following description:
3.4.1. Symbol Values
"The values of these three most significant bits of the st_other field have the
following meanings:
...
1 The local and global entry points are the same, and r2 should be treated as
caller-saved for local and global callers. "
...
"The value of st_other is determined from the .localentry directive as follows:
If the .localentry value is 0, the value of st_other is 0. If the .localentry
value is 1, the value of st_other is 1. Otherwise, the value of st_other is the
logarithm (base 2) of the .localentry value."
The function bar is with st_other value 1, r2 should be treated as
caller-saved, so it doesn't take action to preserve r2.
>
> I verified the testcase misbehaves on Compile Farm's gcc135: as it does not
> use any power10-specific instructions, it's runnable there.
I tried the attachment on one local machine (also ppc64le p9) and noticed the
linker already did some fix-ups with long_branch.bar stub,
Dump of assembler code for function main:
0x0000000010000540 <+0>: lis r2,4098
0x0000000010000544 <+4>: addi r2,r2,32512
0x0000000010000548 <+8>: mflr r0
0x000000001000054c <+12>: nop
0x0000000010000550 <+16>: ld r3,-32728(r2)
0x0000000010000554 <+20>: std r0,16(r1)
0x0000000010000558 <+24>: stdu r1,-32(r1)
0x000000001000055c <+28>: bl 0x10000510 <00000038.long_branch.bar>
=> 0x0000000010000560 <+32>: ld r2,24(r1)
0x0000000010000564 <+36>: addis r3,r2,-2
0x0000000010000568 <+40>: addi r3,r3,-30328
Dump of assembler code for function 00000038.long_branch.bar:
=> 0x0000000010000510 <+0>: std r2,24(r1)
0x0000000010000514 <+4>: b 0x10000710 <bar>
which would save r2 onto the corresponding stack slot ahead, it runs well as
expected. Not sure why it doesn't work on your side, maybe this inter-operation
requires some support in newer binutils? My local one is GNU ld 2.34 which is
for final linking (and 2.35 for power10 support, ie. bar.o generation).