Re: [go-nuts] gc: optimize JMP to RET instructions

Arseny Samoylov Thu, 15 Aug 2024 01:01:11 -0700

I guess you are right.
Thank you very much for the discussion!

On Wednesday 14 August 2024 at 20:21:01 UTC+3 robert engels wrote:


> My understanding is that optimizations like this are almost never worth it 
> on modern processors - the increased code size works against the modern 
> branch predictor and speculative executions - vs the single shared piece of 
> code - there is less possibilities and thus instructions to preload.
>
> On Aug 14, 2024, at 11:46 AM, Arseny Samoylov <samoylo...@gmail.com> 
> wrote:
>
> > Won’t the speculative/parallel execution by most processors make the JMP 
> essentially a no-op?
> I guess you are right, but this is true when JMP destination already in 
> instruction buffer. I guess most of these cases are when JMP leads to RET 
> inside on function, so indeed this optimization will have almost zero 
> effect. But if RET instruction appears to be far enough, I guess this 
> optimization can be meaningful.
>
> On Wednesday 14 August 2024 at 19:40:22 UTC+3 robert engels wrote:
>
>> Won’t the speculative/parallel execution by most processors make the JMP 
>> essentially a no-op?
>>
>> See 
>> https://stackoverflow.com/questions/5127833/meaningful-cost-of-the-jump-instruction
>>
>> On Aug 14, 2024, at 11:31 AM, Arseny Samoylov <samoylo...@gmail.com> 
>> wrote:
>>
>> Thank you for your answer!
>>
>> > We generally don't do optimizations like that directly on assembly.
>> I definitely agree. But this is also a pattern for generated code.
>>
>> > and concerns about debuggability (can you set a breakpoint on each 
>> return in the source?) also matter
>> This is an interesting problem that I haven't thought about, thank you!
>>
>> > That is a JMP to the LDP instruction, not directly to the RET.
>> Yes, but on Prog representation it is. I mentioned it when pointed out 
>> problem with increasing code size (RET translates to multiple instructions).
>>
>> >  There's some discussion here https://github.com/golang/go/issues/24936
>> I am grateful for the link to the discussion. In this discussion, you 
>> mentioned yours abandoned CL 
>> <https://github.com/golang/go/issues/24936#issuecomment-383253003>that 
>> actually does the contrary of my optimization =).
>>
>> >  It would need benchmarks demonstrating it is worth it
>> Can you please provide some suggestions for benchmarks? I tried bent, but 
>> I would like to test on some other benchmarks. 
>>
>> Thank you in advance!
>> On Wednesday 14 August 2024 at 03:59:55 UTC+3 Keith Randall wrote:
>>
>>> We generally don't do optimizations like that directly on assembly. In 
>>> fact, we used to do some like that but they have been removed.
>>> We want the generated machine code to faithfully mirror the assembly 
>>> input. People writing assembly have all kind of reasons for laying out 
>>> instructions in particular ways (better for various caches, etc) that we 
>>> don't want to disrupt.
>>>
>>> If the Go compiler is generating such a pattern, we can optimize that. 
>>> There's some discussion here https://github.com/golang/go/issues/24936 
>>> but nothing substantive came of it. It would need benchmarks demonstrating 
>>> it is worth it, and concerns about debuggability (can you set a breakpoint 
>>> on each return in the source?) also matter.
>>>
>>> > Ps: example of JMP to RET from runtime:
>>>
>>> That is a JMP to the LDP instruction, not directly to the RET.
>>> On Tuesday, August 13, 2024 at 10:10:58 AM UTC-7 Arseny Samoylov wrote:
>>>
>>>> Hello community, recently I found that gc generates a lot of JMP to RET 
>>>> instructions and there is no optimization for that. Consider this example:
>>>>
>>>> ```
>>>> // asm_arm64.s
>>>> #include "textflag.h"
>>>>
>>>>  
>>>> TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0
>>>>     JMP *ret*
>>>> ret:
>>>>     *RET*
>>>> *```*
>>>> This compiles to :
>>>> ```
>>>> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>>>>   asm_arm64.s:4         0x77530                 14000001                
>>>> JMP 1(PC)
>>>>
>>>>   asm_arm64.s:6         0x77534                 d65f03c0                
>>>> RET
>>>> ```
>>>>
>>>> Obviously, it can be optimized just to RET instruction.
>>>> So I made a patch that replaces JMP to RET with RET instruction (on 
>>>> Prog representation):
>>>> ```
>>>> diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go
>>>> index 066b779539..87f1121641 100644
>>>> --- a/src/cmd/internal/obj/pass.go
>>>> +++ b/src/cmd/internal/obj/pass.go
>>>> @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog 
>>>> ProgAlloc) {
>>>>                         continue
>>>>                 }
>>>>                 p.To.SetTarget(brloop(p.To.Target()))
>>>> -               if p.To.Target() != nil && p.To.Type == TYPE_BRANCH {
>>>> -                       p.To.Offset = p.To.Target().Pc
>>>> +               if p.To.Target() != nil {
>>>> +                       if p.As == AJMP && p.To.Target().As == ARET {
>>>> +                               p.As = ARET
>>>> +                               p.To = p.To.Target().To
>>>> +                               continue
>>>> +                       }
>>>> +
>>>> +                       if p.To.Type == TYPE_BRANCH {
>>>> +                               p.To.Offset = p.To.Target().Pc
>>>> +                       }
>>>>                 }
>>>>         }
>>>>  }
>>>> ```
>>>> You can find this patch on my GH 
>>>> <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>.
>>>>
>>>> I encountered few problems:
>>>> * Increase in code size - because RET instruction can translate in 
>>>> multiple instructions (ldp, add, and ret - on arm64 for example):
>>>> .text section of simple go program that calls function from above 
>>>> increases in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 
>>>> 10KB) in .text section size 
>>>> (this is for arm64 binaries)
>>>> * Optimization on Prog representation is too late, and example above 
>>>> translates to:
>>>> ```
>>>> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
>>>>   asm_arm64.s:4         0x77900                 d65f03c0                
>>>> RET
>>>>
>>>>   asm_arm64.s:6         0x77904                 d65f03c0                
>>>> RET
>>>> ```
>>>> (no dead code elimination was done =( )
>>>>
>>>> So I am looking for some ideas. Maybe this optimization should be done 
>>>> on SSA form and needs some heuristics (to avoid increase in code size).
>>>> And also I would like to have suggestion where to benchmark my 
>>>> optimization. Bent benchmark is tooooo long =(.
>>>>
>>>> Ps: example of JMP to RET from runtime:
>>>> ```
>>>>
>>>> TEXT runtime.strequal(SB) a/go/src/runtime/alg.go
>>>>
>>>> …
>>>>
>>>>   alg.go:378            0x12eac                 14000004                
>>>> JMP 4(PC) // JMP to RET in Prog
>>>>
>>>>   alg.go:378            0x12eb0                 f9400000                
>>>> MOVD (R0), R0
>>>>
>>>>   alg.go:378            0x12eb4                 f9400021                
>>>> MOVD (R1), R1
>>>>
>>>>   alg.go:378            0x12eb8                 97fffc72                
>>>> CALL runtime.memequal(SB)
>>>>
>>>>   alg.go:378            0x12ebc                 a97ffbfd                
>>>> LDP -8(RSP), (R29, R30)
>>>>
>>>>   alg.go:378            0x12ec0                 9100c3ff                
>>>> ADD $48, RSP, RSP
>>>>
>>>>   alg.go:378            0x12ec4                 d65f03c0                
>>>> RET
>>>>
>>>> ...
>>>>
>>>> ```
>>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts...@googlegroups.com.
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/1e280aca-1ccc-4aca-9d32-83ecddce50c3n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/1e280aca-1ccc-4aca-9d32-83ecddce50c3n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/feee53b5-dc11-4ba4-b0b5-b2f9a30a0a8bn%40googlegroups.com.

Re: [go-nuts] gc: optimize JMP to RET instructions

Reply via email to