Won’t the speculative/parallel execution by most processors make the JMP essentially a no-op?
See https://stackoverflow.com/questions/5127833/meaningful-cost-of-the-jump-instruction > On Aug 14, 2024, at 11:31 AM, Arseny Samoylov <samoylov.ars...@gmail.com> > wrote: > > Thank you for your answer! > > > We generally don't do optimizations like that directly on assembly. > I definitely agree. But this is also a pattern for generated code. > > > and concerns about debuggability (can you set a breakpoint on each return > > in the source?) also matter > This is an interesting problem that I haven't thought about, thank you! > > > That is a JMP to the LDP instruction, not directly to the RET. > Yes, but on Prog representation it is. I mentioned it when pointed out > problem with increasing code size (RET translates to multiple instructions). > > > There's some discussion here https://github.com/golang/go/issues/24936 > > <https://github.com/golang/go/issues/24936> > I am grateful for the link to the discussion. In this discussion, you > mentioned yours abandoned CL > <https://github.com/golang/go/issues/24936#issuecomment-383253003>that > actually does the contrary of my optimization =). > > > It would need benchmarks demonstrating it is worth it > Can you please provide some suggestions for benchmarks? I tried bent, but I > would like to test on some other benchmarks. > > Thank you in advance! > On Wednesday 14 August 2024 at 03:59:55 UTC+3 Keith Randall wrote: > We generally don't do optimizations like that directly on assembly. In fact, > we used to do some like that but they have been removed. > We want the generated machine code to faithfully mirror the assembly input. > People writing assembly have all kind of reasons for laying out instructions > in particular ways (better for various caches, etc) that we don't want to > disrupt. > > If the Go compiler is generating such a pattern, we can optimize that. > There's some discussion here https://github.com/golang/go/issues/24936 > <https://github.com/golang/go/issues/24936> but nothing substantive came of > it. It would need benchmarks demonstrating it is worth it, and concerns about > debuggability (can you set a breakpoint on each return in the source?) also > matter. > > > Ps: example of JMP to RET from runtime: > > That is a JMP to the LDP instruction, not directly to the RET. > On Tuesday, August 13, 2024 at 10:10:58 AM UTC-7 Arseny Samoylov wrote: > Hello community, recently I found that gc generates a lot of JMP to RET > instructions and there is no optimization for that. Consider this example: > > ``` > // asm_arm64.s > #include "textflag.h" > > TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0 > JMP ret > ret: > RET > ``` > This compiles to : > ``` > TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s > asm_arm64.s:4 0x77530 14000001 JMP > 1(PC) > asm_arm64.s:6 0x77534 d65f03c0 RET > ``` > > Obviously, it can be optimized just to RET instruction. > So I made a patch that replaces JMP to RET with RET instruction (on Prog > representation): > ``` > diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go > index 066b779539..87f1121641 100644 > --- a/src/cmd/internal/obj/pass.go > +++ b/src/cmd/internal/obj/pass.go > @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog ProgAlloc) > { > continue > } > p.To.SetTarget(brloop(p.To.Target())) > - if p.To.Target() != nil && p.To.Type == TYPE_BRANCH { > - p.To.Offset = p.To.Target().Pc > + if p.To.Target() != nil { > + if p.As == AJMP && p.To.Target().As == ARET { > + p.As = ARET > + p.To = p.To.Target().To > + continue > + } > + > + if p.To.Type == TYPE_BRANCH { > + p.To.Offset = p.To.Target().Pc > + } > } > } > } > ``` > You can find this patch on my GH > <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>. > > I encountered few problems: > * Increase in code size - because RET instruction can translate in multiple > instructions (ldp, add, and ret - on arm64 for example): > .text section of simple go program that calls function from above increases > in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 10KB) in .text > section size > (this is for arm64 binaries) > * Optimization on Prog representation is too late, and example above > translates to: > ``` > TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s > asm_arm64.s:4 0x77900 d65f03c0 RET > asm_arm64.s:6 0x77904 d65f03c0 RET > ``` > (no dead code elimination was done =( ) > > So I am looking for some ideas. Maybe this optimization should be done on SSA > form and needs some heuristics (to avoid increase in code size). > And also I would like to have suggestion where to benchmark my optimization. > Bent benchmark is tooooo long =(. > > Ps: example of JMP to RET from runtime: > ``` > TEXT runtime.strequal(SB) a/go/src/runtime/alg.go > > … > > alg.go:378 0x12eac 14000004 JMP > 4(PC) // JMP to RET in Prog > > alg.go:378 0x12eb0 f9400000 MOVD > (R0), R0 > > alg.go:378 0x12eb4 f9400021 MOVD > (R1), R1 > > alg.go:378 0x12eb8 97fffc72 CALL > runtime.memequal(SB) > > alg.go:378 0x12ebc a97ffbfd LDP > -8(RSP), (R29, R30) > > alg.go:378 0x12ec0 9100c3ff ADD > $48, RSP, RSP > > alg.go:378 0x12ec4 d65f03c0 RET > > ... > > ``` > > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com > <mailto:golang-nuts+unsubscr...@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com > > <https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/723C04E0-F81C-4258-B497-91C8BBE406DD%40ix.netcom.com.