I guess you are right. Thank you very much for the discussion! On Wednesday 14 August 2024 at 20:21:01 UTC+3 robert engels wrote:
> My understanding is that optimizations like this are almost never worth it > on modern processors - the increased code size works against the modern > branch predictor and speculative executions - vs the single shared piece of > code - there is less possibilities and thus instructions to preload. > > On Aug 14, 2024, at 11:46 AM, Arseny Samoylov <samoylo...@gmail.com> > wrote: > > > Won’t the speculative/parallel execution by most processors make the JMP > essentially a no-op? > I guess you are right, but this is true when JMP destination already in > instruction buffer. I guess most of these cases are when JMP leads to RET > inside on function, so indeed this optimization will have almost zero > effect. But if RET instruction appears to be far enough, I guess this > optimization can be meaningful. > > On Wednesday 14 August 2024 at 19:40:22 UTC+3 robert engels wrote: > >> Won’t the speculative/parallel execution by most processors make the JMP >> essentially a no-op? >> >> See >> https://stackoverflow.com/questions/5127833/meaningful-cost-of-the-jump-instruction >> >> On Aug 14, 2024, at 11:31 AM, Arseny Samoylov <samoylo...@gmail.com> >> wrote: >> >> Thank you for your answer! >> >> > We generally don't do optimizations like that directly on assembly. >> I definitely agree. But this is also a pattern for generated code. >> >> > and concerns about debuggability (can you set a breakpoint on each >> return in the source?) also matter >> This is an interesting problem that I haven't thought about, thank you! >> >> > That is a JMP to the LDP instruction, not directly to the RET. >> Yes, but on Prog representation it is. I mentioned it when pointed out >> problem with increasing code size (RET translates to multiple instructions). >> >> > There's some discussion here https://github.com/golang/go/issues/24936 >> I am grateful for the link to the discussion. In this discussion, you >> mentioned yours abandoned CL >> <https://github.com/golang/go/issues/24936#issuecomment-383253003>that >> actually does the contrary of my optimization =). >> >> > It would need benchmarks demonstrating it is worth it >> Can you please provide some suggestions for benchmarks? I tried bent, but >> I would like to test on some other benchmarks. >> >> Thank you in advance! >> On Wednesday 14 August 2024 at 03:59:55 UTC+3 Keith Randall wrote: >> >>> We generally don't do optimizations like that directly on assembly. In >>> fact, we used to do some like that but they have been removed. >>> We want the generated machine code to faithfully mirror the assembly >>> input. People writing assembly have all kind of reasons for laying out >>> instructions in particular ways (better for various caches, etc) that we >>> don't want to disrupt. >>> >>> If the Go compiler is generating such a pattern, we can optimize that. >>> There's some discussion here https://github.com/golang/go/issues/24936 >>> but nothing substantive came of it. It would need benchmarks demonstrating >>> it is worth it, and concerns about debuggability (can you set a breakpoint >>> on each return in the source?) also matter. >>> >>> > Ps: example of JMP to RET from runtime: >>> >>> That is a JMP to the LDP instruction, not directly to the RET. >>> On Tuesday, August 13, 2024 at 10:10:58 AM UTC-7 Arseny Samoylov wrote: >>> >>>> Hello community, recently I found that gc generates a lot of JMP to RET >>>> instructions and there is no optimization for that. Consider this example: >>>> >>>> ``` >>>> // asm_arm64.s >>>> #include "textflag.h" >>>> >>>> >>>> TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0 >>>> JMP *ret* >>>> ret: >>>> *RET* >>>> *```* >>>> This compiles to : >>>> ``` >>>> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s >>>> asm_arm64.s:4 0x77530 14000001 >>>> JMP 1(PC) >>>> >>>> asm_arm64.s:6 0x77534 d65f03c0 >>>> RET >>>> ``` >>>> >>>> Obviously, it can be optimized just to RET instruction. >>>> So I made a patch that replaces JMP to RET with RET instruction (on >>>> Prog representation): >>>> ``` >>>> diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go >>>> index 066b779539..87f1121641 100644 >>>> --- a/src/cmd/internal/obj/pass.go >>>> +++ b/src/cmd/internal/obj/pass.go >>>> @@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog >>>> ProgAlloc) { >>>> continue >>>> } >>>> p.To.SetTarget(brloop(p.To.Target())) >>>> - if p.To.Target() != nil && p.To.Type == TYPE_BRANCH { >>>> - p.To.Offset = p.To.Target().Pc >>>> + if p.To.Target() != nil { >>>> + if p.As == AJMP && p.To.Target().As == ARET { >>>> + p.As = ARET >>>> + p.To = p.To.Target().To >>>> + continue >>>> + } >>>> + >>>> + if p.To.Type == TYPE_BRANCH { >>>> + p.To.Offset = p.To.Target().Pc >>>> + } >>>> } >>>> } >>>> } >>>> ``` >>>> You can find this patch on my GH >>>> <https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>. >>>> >>>> I encountered few problems: >>>> * Increase in code size - because RET instruction can translate in >>>> multiple instructions (ldp, add, and ret - on arm64 for example): >>>> .text section of simple go program that calls function from above >>>> increases in 0x3D0 bytes; go binary itself increases in 0x2570 (almost >>>> 10KB) in .text section size >>>> (this is for arm64 binaries) >>>> * Optimization on Prog representation is too late, and example above >>>> translates to: >>>> ``` >>>> TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s >>>> asm_arm64.s:4 0x77900 d65f03c0 >>>> RET >>>> >>>> asm_arm64.s:6 0x77904 d65f03c0 >>>> RET >>>> ``` >>>> (no dead code elimination was done =( ) >>>> >>>> So I am looking for some ideas. Maybe this optimization should be done >>>> on SSA form and needs some heuristics (to avoid increase in code size). >>>> And also I would like to have suggestion where to benchmark my >>>> optimization. Bent benchmark is tooooo long =(. >>>> >>>> Ps: example of JMP to RET from runtime: >>>> ``` >>>> >>>> TEXT runtime.strequal(SB) a/go/src/runtime/alg.go >>>> >>>> … >>>> >>>> alg.go:378 0x12eac 14000004 >>>> JMP 4(PC) // JMP to RET in Prog >>>> >>>> alg.go:378 0x12eb0 f9400000 >>>> MOVD (R0), R0 >>>> >>>> alg.go:378 0x12eb4 f9400021 >>>> MOVD (R1), R1 >>>> >>>> alg.go:378 0x12eb8 97fffc72 >>>> CALL runtime.memequal(SB) >>>> >>>> alg.go:378 0x12ebc a97ffbfd >>>> LDP -8(RSP), (R29, R30) >>>> >>>> alg.go:378 0x12ec0 9100c3ff >>>> ADD $48, RSP, RSP >>>> >>>> alg.go:378 0x12ec4 d65f03c0 >>>> RET >>>> >>>> ... >>>> >>>> ``` >>>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-nuts" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golang-nuts...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com >> >> <https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/1e280aca-1ccc-4aca-9d32-83ecddce50c3n%40googlegroups.com > > <https://groups.google.com/d/msgid/golang-nuts/1e280aca-1ccc-4aca-9d32-83ecddce50c3n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/feee53b5-dc11-4ba4-b0b5-b2f9a30a0a8bn%40googlegroups.com.