Re: [go-nuts] gc: optimize JMP to RET instructions

robert engels Wed, 14 Aug 2024 09:39:50 -0700

Won’t the speculative/parallel execution by most processors make the JMP 
essentially a no-op?


See 
https://stackoverflow.com/questions/5127833/meaningful-cost-of-the-jump-instruction

> On Aug 14, 2024, at 11:31 AM, Arseny Samoylov <samoylov.ars...@gmail.com> 
> wrote:
> 
> Thank you for your answer!
> 
> > We generally don't do optimizations like that directly on assembly.
> I definitely agree. But this is also a pattern for generated code.
> 
> > and concerns about debuggability (can you set a breakpoint on each return 
> > in the source?) also matter
> This is an interesting problem that I haven't thought about, thank you!
> 
> > That is a JMP to the LDP instruction, not directly to the RET.
> Yes, but on Prog representation it is. I mentioned it when pointed out 
> problem with increasing code size (RET translates to multiple instructions).
> 
> >  There's some discussion here https://github.com/golang/go/issues/24936 
> > <https://github.com/golang/go/issues/24936>
> I am grateful for the link to the discussion. In this discussion, you 
> mentioned yours abandoned CL  
> <https://github.com/golang/go/issues/24936#issuecomment-383253003>that 
> actually does the contrary of my optimization =).
> 
> >  It would need benchmarks demonstrating it is worth it
> Can you please provide some suggestions for benchmarks? I tried bent, but I 
> would like to test on some other benchmarks. 
> 
> Thank you in advance!
> On Wednesday 14 August 2024 at 03:59:55 UTC+3 Keith Randall wrote:
> We generally don't do optimizations like that directly on assembly. In fact, 
> we used to do some like that but they have been removed.
> We want the generated machine code to faithfully mirror the assembly input. 
> People writing assembly have all kind of reasons for laying out instructions 
> in particular ways (better for various caches, etc) that we don't want to 
> disrupt.
> 
> If the Go compiler is generating such a pattern, we can optimize that. 
> There's some discussion here https://github.com/golang/go/issues/24936 
> <https://github.com/golang/go/issues/24936> but nothing substantive came of 
> it. It would need benchmarks demonstrating it is worth it, and concerns about 
> debuggability (can you set a breakpoint on each return in the source?) also 
> matter.
> 
> > Ps: example of JMP to RET from runtime:
> 
> That is a JMP to the LDP instruction, not directly to the RET.
> On Tuesday, August 13, 2024 at 10:10:58 AM UTC-7 Arseny Samoylov wrote:
> Hello community, recently I found that gc generates a lot of JMP to RET 
> instructions and there is no optimization for that. Consider this example:
> 
> ```
> // asm_arm64.s
> #include "textflag.h"
>  
> TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0
>     JMP ret
> ret:
>     RET
> ```
> This compiles to :
> ```
> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>   asm_arm64.s:4         0x77530                 14000001                JMP 
> 1(PC)
>   asm_arm64.s:6         0x77534                 d65f03c0                RET
> ```
> 
> Obviously, it can be optimized just to RET instruction.
> So I made a patch that replaces JMP to RET with RET instruction (on Prog 
> representation):
> ```
> diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go
> index 066b779539..87f1121641 100644
> --- a/src/cmd/internal/obj/pass.go
> +++ b/src/cmd/internal/obj/pass.go
> @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog ProgAlloc) 
> {
>                         continue
>                 }
>                 p.To.SetTarget(brloop(p.To.Target()))
> -               if p.To.Target() != nil && p.To.Type == TYPE_BRANCH {
> -                       p.To.Offset = p.To.Target().Pc
> +               if p.To.Target() != nil {
> +                       if p.As == AJMP && p.To.Target().As == ARET {
> +                               p.As = ARET
> +                               p.To = p.To.Target().To
> +                               continue
> +                       }
> +
> +                       if p.To.Type == TYPE_BRANCH {
> +                               p.To.Offset = p.To.Target().Pc
> +                       }
>                 }
>         }
>  }
> ```
> You can find this patch on my GH 
> <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>.
> 
> I encountered few problems:
> * Increase in code size - because RET instruction can translate in multiple 
> instructions (ldp, add, and ret - on arm64 for example):
> .text section of simple go program that calls function from above increases 
> in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 10KB) in .text 
> section size 
> (this is for arm64 binaries)
> * Optimization on Prog representation is too late, and example above 
> translates to:
> ```
> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>   asm_arm64.s:4         0x77900                 d65f03c0                RET
>   asm_arm64.s:6         0x77904                 d65f03c0                RET
> ```
> (no dead code elimination was done =( )
> 
> So I am looking for some ideas. Maybe this optimization should be done on SSA 
> form and needs some heuristics (to avoid increase in code size).
> And also I would like to have suggestion where to benchmark my optimization. 
> Bent benchmark is tooooo long =(.
> 
> Ps: example of JMP to RET from runtime:
> ```
> TEXT runtime.strequal(SB) a/go/src/runtime/alg.go
> 
> …
> 
>   alg.go:378            0x12eac                 14000004                JMP 
> 4(PC) // JMP to RET in Prog
> 
>   alg.go:378            0x12eb0                 f9400000                MOVD 
> (R0), R0
> 
>   alg.go:378            0x12eb4                 f9400021                MOVD 
> (R1), R1
> 
>   alg.go:378            0x12eb8                 97fffc72                CALL 
> runtime.memequal(SB)
> 
>   alg.go:378            0x12ebc                 a97ffbfd                LDP 
> -8(RSP), (R29, R30)
> 
>   alg.go:378            0x12ec0                 9100c3ff                ADD 
> $48, RSP, RSP
> 
>   alg.go:378            0x12ec4                 d65f03c0                RET
> 
> ...
> 
> ```
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com 
> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/723C04E0-F81C-4258-B497-91C8BBE406DD%40ix.netcom.com.

Re: [go-nuts] gc: optimize JMP to RET instructions

Reply via email to