We generally don't do optimizations like that directly on assembly. In fact, we used to do some like that but they have been removed. We want the generated machine code to faithfully mirror the assembly input. People writing assembly have all kind of reasons for laying out instructions in particular ways (better for various caches, etc) that we don't want to disrupt.
If the Go compiler is generating such a pattern, we can optimize that. There's some discussion here https://github.com/golang/go/issues/24936 but nothing substantive came of it. It would need benchmarks demonstrating it is worth it, and concerns about debuggability (can you set a breakpoint on each return in the source?) also matter. > Ps: example of JMP to RET from runtime: That is a JMP to the LDP instruction, not directly to the RET. On Tuesday, August 13, 2024 at 10:10:58 AM UTC-7 Arseny Samoylov wrote: > Hello community, recently I found that gc generates a lot of JMP to RET > instructions and there is no optimization for that. Consider this example: > > ``` > > // asm_arm64.s > > #include "textflag.h" > > > > TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0 > > JMP *ret* > > ret: > > *RET* > > *```* > > This compiles to : > > ``` > > TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s > > asm_arm64.s:4 0x77530 14000001 > JMP 1(PC) > > asm_arm64.s:6 0x77534 d65f03c0 RET > > ``` > > > Obviously, it can be optimized just to RET instruction. > > So I made a patch that replaces JMP to RET with RET instruction (on Prog > representation): > > ``` > diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go > index 066b779539..87f1121641 100644 > --- a/src/cmd/internal/obj/pass.go > +++ b/src/cmd/internal/obj/pass.go > @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog > ProgAlloc) { > continue > } > p.To.SetTarget(brloop(p.To.Target())) > - if p.To.Target() != nil && p.To.Type == TYPE_BRANCH { > - p.To.Offset = p.To.Target().Pc > + if p.To.Target() != nil { > + if p.As == AJMP && p.To.Target().As == ARET { > + p.As = ARET > + p.To = p.To.Target().To > + continue > + } > + > + if p.To.Type == TYPE_BRANCH { > + p.To.Offset = p.To.Target().Pc > + } > } > } > } > > ``` > > You can find this patch on my GH > <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>. > > > I encountered few problems: > > * Increase in code size - because RET instruction can translate in > multiple instructions (ldp, add, and ret - on arm64 for example): > > .text section of simple go program that calls function from above > increases in 0x3D0 bytes; go binary itself increases in 0x2570 (almost > 10KB) in .text section size > > (this is for arm64 binaries) > > * Optimization on Prog representation is too late, and example above > translates to: > > ``` > > TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s > > asm_arm64.s:4 0x77900 d65f03c0 RET > > asm_arm64.s:6 0x77904 d65f03c0 RET > > ``` > > (no dead code elimination was done =( ) > > > So I am looking for some ideas. Maybe this optimization should be done on > SSA form and needs some heuristics (to avoid increase in code size). > > And also I would like to have suggestion where to benchmark my > optimization. Bent benchmark is tooooo long =(. > > > Ps: example of JMP to RET from runtime: > > ``` > > TEXT runtime.strequal(SB) a/go/src/runtime/alg.go > > … > > alg.go:378 0x12eac 14000004 > JMP 4(PC) // JMP to RET in Prog > > alg.go:378 0x12eb0 f9400000 > MOVD (R0), R0 > > alg.go:378 0x12eb4 f9400021 > MOVD (R1), R1 > > alg.go:378 0x12eb8 97fffc72 > CALL runtime.memequal(SB) > > alg.go:378 0x12ebc a97ffbfd > LDP -8(RSP), (R29, R30) > > alg.go:378 0x12ec0 9100c3ff > ADD $48, RSP, RSP > > alg.go:378 0x12ec4 d65f03c0 RET > > ... > > ``` > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/55546973-9063-4828-9788-f695a15d7ce2n%40googlegroups.com.