[go-nuts] Re: gc: optimize JMP to RET instructions

Arseny Samoylov Wed, 14 Aug 2024 09:32:10 -0700

Thank you for your answer!

> We generally don't do optimizations like that directly on assembly.
I definitely agree. But this is also a pattern for generated code.


> and concerns about debuggability (can you set a breakpoint on each return 
in the source?) also matter
This is an interesting problem that I haven't thought about, thank you!

> That is a JMP to the LDP instruction, not directly to the RET.
Yes, but on Prog representation it is. I mentioned it when pointed out 
problem with increasing code size (RET translates to multiple instructions).

>  There's some discussion here https://github.com/golang/go/issues/24936
I am grateful for the link to the discussion. In this discussion, you 
mentioned yours abandoned CL 
<https://github.com/golang/go/issues/24936#issuecomment-383253003>that 
actually does the contrary of my optimization =).

>  It would need benchmarks demonstrating it is worth it
Can you please provide some suggestions for benchmarks? I tried bent, but I 
would like to test on some other benchmarks. 

Thank you in advance!
On Wednesday 14 August 2024 at 03:59:55 UTC+3 Keith Randall wrote:

> We generally don't do optimizations like that directly on assembly. In 
> fact, we used to do some like that but they have been removed.
> We want the generated machine code to faithfully mirror the assembly 
> input. People writing assembly have all kind of reasons for laying out 
> instructions in particular ways (better for various caches, etc) that we 
> don't want to disrupt.
>
> If the Go compiler is generating such a pattern, we can optimize that. 
> There's some discussion here https://github.com/golang/go/issues/24936 
> but nothing substantive came of it. It would need benchmarks demonstrating 
> it is worth it, and concerns about debuggability (can you set a breakpoint 
> on each return in the source?) also matter.
>
> > Ps: example of JMP to RET from runtime:
>
> That is a JMP to the LDP instruction, not directly to the RET.
> On Tuesday, August 13, 2024 at 10:10:58 AM UTC-7 Arseny Samoylov wrote:
>
>> Hello community, recently I found that gc generates a lot of JMP to RET 
>> instructions and there is no optimization for that. Consider this example:
>>
>> ```
>>
>> // asm_arm64.s
>>
>> #include "textflag.h"
>>
>>  
>>
>> TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0
>>
>>     JMP *ret*
>>
>> ret:
>>
>>     *RET*
>>
>> *```*
>>
>> This compiles to :
>>
>> ```
>>
>> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>>
>>   asm_arm64.s:4         0x77530                 14000001                
>> JMP 1(PC)
>>
>>   asm_arm64.s:6         0x77534                 d65f03c0                
>> RET
>>
>> ```
>>
>>
>> Obviously, it can be optimized just to RET instruction.
>>
>> So I made a patch that replaces JMP to RET with RET instruction (on Prog 
>> representation):
>>
>> ```
>> diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go
>> index 066b779539..87f1121641 100644
>> --- a/src/cmd/internal/obj/pass.go
>> +++ b/src/cmd/internal/obj/pass.go
>> @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog 
>> ProgAlloc) {
>>                         continue
>>                 }
>>                 p.To.SetTarget(brloop(p.To.Target()))
>> -               if p.To.Target() != nil && p.To.Type == TYPE_BRANCH {
>> -                       p.To.Offset = p.To.Target().Pc
>> +               if p.To.Target() != nil {
>> +                       if p.As == AJMP && p.To.Target().As == ARET {
>> +                               p.As = ARET
>> +                               p.To = p.To.Target().To
>> +                               continue
>> +                       }
>> +
>> +                       if p.To.Type == TYPE_BRANCH {
>> +                               p.To.Offset = p.To.Target().Pc
>> +                       }
>>                 }
>>         }
>>  }
>>
>> ```
>>
>> You can find this patch on my GH 
>> <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>.
>>
>>
>> I encountered few problems:
>>
>> * Increase in code size - because RET instruction can translate in 
>> multiple instructions (ldp, add, and ret - on arm64 for example):
>>
>> .text section of simple go program that calls function from above 
>> increases in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 
>> 10KB) in .text section size 
>>
>> (this is for arm64 binaries)
>>
>> * Optimization on Prog representation is too late, and example above 
>> translates to:
>>
>> ```
>>
>> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>>
>>   asm_arm64.s:4         0x77900                 d65f03c0                
>> RET
>>
>>   asm_arm64.s:6         0x77904                 d65f03c0                
>> RET
>>
>> ```
>>
>> (no dead code elimination was done =( )
>>
>>
>> So I am looking for some ideas. Maybe this optimization should be done on 
>> SSA form and needs some heuristics (to avoid increase in code size).
>>
>> And also I would like to have suggestion where to benchmark my 
>> optimization. Bent benchmark is tooooo long =(.
>>
>>
>> Ps: example of JMP to RET from runtime:
>>
>> ```
>>
>> TEXT runtime.strequal(SB) a/go/src/runtime/alg.go
>>
>> …
>>
>>   alg.go:378            0x12eac                 14000004                
>> JMP 4(PC) // JMP to RET in Prog
>>
>>   alg.go:378            0x12eb0                 f9400000                
>> MOVD (R0), R0
>>
>>   alg.go:378            0x12eb4                 f9400021                
>> MOVD (R1), R1
>>
>>   alg.go:378            0x12eb8                 97fffc72                
>> CALL runtime.memequal(SB)
>>
>>   alg.go:378            0x12ebc                 a97ffbfd                
>> LDP -8(RSP), (R29, R30)
>>
>>   alg.go:378            0x12ec0                 9100c3ff                
>> ADD $48, RSP, RSP
>>
>>   alg.go:378            0x12ec4                 d65f03c0                
>> RET
>>
>> ...
>>
>> ```
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com.

[go-nuts] Re: gc: optimize JMP to RET instructions

Reply via email to