[go-nuts] Re: gc: optimize JMP to RET instructions

'Keith Randall' via golang-nuts Tue, 13 Aug 2024 18:00:07 -0700

We generally don't do optimizations like that directly on assembly. In 
fact, we used to do some like that but they have been removed.
We want the generated machine code to faithfully mirror the assembly input. 
People writing assembly have all kind of reasons for laying out 
instructions in particular ways (better for various caches, etc) that we 
don't want to disrupt.


If the Go compiler is generating such a pattern, we can optimize that. 
There's some discussion here https://github.com/golang/go/issues/24936 but 
nothing substantive came of it. It would need benchmarks demonstrating it 
is worth it, and concerns about debuggability (can you set a breakpoint on 
each return in the source?) also matter.

> Ps: example of JMP to RET from runtime:

That is a JMP to the LDP instruction, not directly to the RET.
On Tuesday, August 13, 2024 at 10:10:58 AM UTC-7 Arseny Samoylov wrote:

> Hello community, recently I found that gc generates a lot of JMP to RET 
> instructions and there is no optimization for that. Consider this example:
>
> ```
>
> // asm_arm64.s
>
> #include "textflag.h"
>
>  
>
> TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0
>
>     JMP *ret*
>
> ret:
>
>     *RET*
>
> *```*
>
> This compiles to :
>
> ```
>
> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>
>   asm_arm64.s:4         0x77530                 14000001                
> JMP 1(PC)
>
>   asm_arm64.s:6         0x77534                 d65f03c0                RET
>
> ```
>
>
> Obviously, it can be optimized just to RET instruction.
>
> So I made a patch that replaces JMP to RET with RET instruction (on Prog 
> representation):
>
> ```
> diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go
> index 066b779539..87f1121641 100644
> --- a/src/cmd/internal/obj/pass.go
> +++ b/src/cmd/internal/obj/pass.go
> @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog 
> ProgAlloc) {
>                         continue
>                 }
>                 p.To.SetTarget(brloop(p.To.Target()))
> -               if p.To.Target() != nil && p.To.Type == TYPE_BRANCH {
> -                       p.To.Offset = p.To.Target().Pc
> +               if p.To.Target() != nil {
> +                       if p.As == AJMP && p.To.Target().As == ARET {
> +                               p.As = ARET
> +                               p.To = p.To.Target().To
> +                               continue
> +                       }
> +
> +                       if p.To.Type == TYPE_BRANCH {
> +                               p.To.Offset = p.To.Target().Pc
> +                       }
>                 }
>         }
>  }
>
> ```
>
> You can find this patch on my GH 
> <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>.
>
>
> I encountered few problems:
>
> * Increase in code size - because RET instruction can translate in 
> multiple instructions (ldp, add, and ret - on arm64 for example):
>
> .text section of simple go program that calls function from above 
> increases in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 
> 10KB) in .text section size 
>
> (this is for arm64 binaries)
>
> * Optimization on Prog representation is too late, and example above 
> translates to:
>
> ```
>
> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>
>   asm_arm64.s:4         0x77900                 d65f03c0                RET
>
>   asm_arm64.s:6         0x77904                 d65f03c0                RET
>
> ```
>
> (no dead code elimination was done =( )
>
>
> So I am looking for some ideas. Maybe this optimization should be done on 
> SSA form and needs some heuristics (to avoid increase in code size).
>
> And also I would like to have suggestion where to benchmark my 
> optimization. Bent benchmark is tooooo long =(.
>
>
> Ps: example of JMP to RET from runtime:
>
> ```
>
> TEXT runtime.strequal(SB) a/go/src/runtime/alg.go
>
> …
>
>   alg.go:378            0x12eac                 14000004                
> JMP 4(PC) // JMP to RET in Prog
>
>   alg.go:378            0x12eb0                 f9400000                
> MOVD (R0), R0
>
>   alg.go:378            0x12eb4                 f9400021                
> MOVD (R1), R1
>
>   alg.go:378            0x12eb8                 97fffc72                
> CALL runtime.memequal(SB)
>
>   alg.go:378            0x12ebc                 a97ffbfd                
> LDP -8(RSP), (R29, R30)
>
>   alg.go:378            0x12ec0                 9100c3ff                
> ADD $48, RSP, RSP
>
>   alg.go:378            0x12ec4                 d65f03c0                RET
>
> ...
>
> ```
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/55546973-9063-4828-9788-f695a15d7ce2n%40googlegroups.com.

[go-nuts] Re: gc: optimize JMP to RET instructions

Reply via email to