Hello community, recently I found that gc generates a lot of JMP to RET instructions and there is no optimization for that. Consider this example:
``` // asm_arm64.s #include "textflag.h" TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0 JMP *ret* ret: *RET* *```* This compiles to : ``` TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s asm_arm64.s:4 0x77530 14000001 JMP 1(PC) asm_arm64.s:6 0x77534 d65f03c0 RET ``` Obviously, it can be optimized just to RET instruction. So I made a patch that replaces JMP to RET with RET instruction (on Prog representation): ``` diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go index 066b779539..87f1121641 100644 --- a/src/cmd/internal/obj/pass.go +++ b/src/cmd/internal/obj/pass.go @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog ProgAlloc) { continue } p.To.SetTarget(brloop(p.To.Target())) - if p.To.Target() != nil && p.To.Type == TYPE_BRANCH { - p.To.Offset = p.To.Target().Pc + if p.To.Target() != nil { + if p.As == AJMP && p.To.Target().As == ARET { + p.As = ARET + p.To = p.To.Target().To + continue + } + + if p.To.Type == TYPE_BRANCH { + p.To.Offset = p.To.Target().Pc + } } } } ``` You can find this patch on my GH <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>. I encountered few problems: * Increase in code size - because RET instruction can translate in multiple instructions (ldp, add, and ret - on arm64 for example): .text section of simple go program that calls function from above increases in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 10KB) in .text section size (this is for arm64 binaries) * Optimization on Prog representation is too late, and example above translates to: ``` TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s asm_arm64.s:4 0x77900 d65f03c0 RET asm_arm64.s:6 0x77904 d65f03c0 RET ``` (no dead code elimination was done =( ) So I am looking for some ideas. Maybe this optimization should be done on SSA form and needs some heuristics (to avoid increase in code size). And also I would like to have suggestion where to benchmark my optimization. Bent benchmark is tooooo long =(. Ps: example of JMP to RET from runtime: ``` TEXT runtime.strequal(SB) a/go/src/runtime/alg.go … alg.go:378 0x12eac 14000004 JMP 4(PC) // JMP to RET in Prog alg.go:378 0x12eb0 f9400000 MOVD (R0), R0 alg.go:378 0x12eb4 f9400021 MOVD (R1), R1 alg.go:378 0x12eb8 97fffc72 CALL runtime.memequal(SB) alg.go:378 0x12ebc a97ffbfd LDP -8(RSP), (R29, R30) alg.go:378 0x12ec0 9100c3ff ADD $48, RSP, RSP alg.go:378 0x12ec4 d65f03c0 RET ... ``` -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/e104bb48-acd9-420f-a28e-620f5829eb96n%40googlegroups.com.