Hello community, recently I found that gc generates a lot of JMP to RET 
instructions and there is no optimization for that. Consider this example:

```

// asm_arm64.s

#include "textflag.h"

 

TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0

    JMP *ret*

ret:

    *RET*

*```*

This compiles to :

```

TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s

  asm_arm64.s:4         0x77530                 14000001                JMP 
1(PC)

  asm_arm64.s:6         0x77534                 d65f03c0                RET

```


Obviously, it can be optimized just to RET instruction.

So I made a patch that replaces JMP to RET with RET instruction (on Prog 
representation):

```
diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go
index 066b779539..87f1121641 100644
--- a/src/cmd/internal/obj/pass.go
+++ b/src/cmd/internal/obj/pass.go
@@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog 
ProgAlloc) {
                        continue
                }
                p.To.SetTarget(brloop(p.To.Target()))
-               if p.To.Target() != nil && p.To.Type == TYPE_BRANCH {
-                       p.To.Offset = p.To.Target().Pc
+               if p.To.Target() != nil {
+                       if p.As == AJMP && p.To.Target().As == ARET {
+                               p.As = ARET
+                               p.To = p.To.Target().To
+                               continue
+                       }
+
+                       if p.To.Type == TYPE_BRANCH {
+                               p.To.Offset = p.To.Target().Pc
+                       }
                }
        }
 }

```

You can find this patch on my GH 
<https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>.


I encountered few problems:

* Increase in code size - because RET instruction can translate in multiple 
instructions (ldp, add, and ret - on arm64 for example):

.text section of simple go program that calls function from above increases 
in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 10KB) in .text 
section size 

(this is for arm64 binaries)

* Optimization on Prog representation is too late, and example above 
translates to:

```

TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s

  asm_arm64.s:4         0x77900                 d65f03c0                RET

  asm_arm64.s:6         0x77904                 d65f03c0                RET

```

(no dead code elimination was done =( )


So I am looking for some ideas. Maybe this optimization should be done on 
SSA form and needs some heuristics (to avoid increase in code size).

And also I would like to have suggestion where to benchmark my 
optimization. Bent benchmark is tooooo long =(.


Ps: example of JMP to RET from runtime:

```

TEXT runtime.strequal(SB) a/go/src/runtime/alg.go

…

  alg.go:378            0x12eac                 14000004                JMP 
4(PC) // JMP to RET in Prog

  alg.go:378            0x12eb0                 f9400000                
MOVD (R0), R0

  alg.go:378            0x12eb4                 f9400021                
MOVD (R1), R1

  alg.go:378            0x12eb8                 97fffc72                
CALL runtime.memequal(SB)

  alg.go:378            0x12ebc                 a97ffbfd                LDP 
-8(RSP), (R29, R30)

  alg.go:378            0x12ec0                 9100c3ff                ADD 
$48, RSP, RSP

  alg.go:378            0x12ec4                 d65f03c0                RET

...

```

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/e104bb48-acd9-420f-a28e-620f5829eb96n%40googlegroups.com.

Reply via email to