[go-nuts] gc: optimize JMP to RET instructions

2024-08-13 Thread Arseny Samoylov
Hello community, recently I found that gc generates a lot of JMP to RET instructions and there is no optimization for that. Consider this example: ``` // asm_arm64.s #include "textflag.h" TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0 JMP *ret* ret: *RET* *```* This compiles to : ``` TE

[go-nuts] Re: gc: optimize JMP to RET instructions

2024-08-14 Thread Arseny Samoylov
benchmarks demonstrating > it is worth it, and concerns about debuggability (can you set a breakpoint > on each return in the source?) also matter. > > > Ps: example of JMP to RET from runtime: > > That is a JMP to the LDP instruction, not directly to the RET. > On Tuesday

Re: [go-nuts] gc: optimize JMP to RET instructions

2024-08-14 Thread Arseny Samoylov
? > > See > https://stackoverflow.com/questions/5127833/meaningful-cost-of-the-jump-instruction > > On Aug 14, 2024, at 11:31 AM, Arseny Samoylov > wrote: > > Thank you for your answer! > > > We generally don't do optimizations like that directly on assembly. &

Re: [go-nuts] gc: optimize JMP to RET instructions

2024-08-15 Thread Arseny Samoylov
t; branch predictor and speculative executions - vs the single shared piece of > code - there is less possibilities and thus instructions to preload. > > On Aug 14, 2024, at 11:46 AM, Arseny Samoylov > wrote: > > > Won’t the speculative/parallel execution by most processors make

[go-nuts] Performance: Restrictions on arguments in registers in SSA implementation

2024-12-12 Thread Arseny Samoylov
Hi everybody! Recently, I noticed that there are some restrictions on the arguments passed to functions in registers. For example, if `a` is a struct, it must have fewer than 5 fields, and its size must be less than `5 * ptrsz`. You can find these restrictions in `cmd/compile/internal/ssa/valu

[go-nuts] Re: Performance: Restrictions on arguments in registers in SSA implementation

2024-12-12 Thread Arseny Samoylov
0(RSP) f94007e5MOVD 8(RSP), R5 8b0100a1 ADD R1, R5, R1 8b010041 ADD R1, R2, R1 8b010061 ADD R1, R3, R1 8b010080 ADD R1, R4, R0 d65f03c0RET ``` On Thursday, 12 December 2024 at 12:53:50 UTC+3 Arseny Sam

[go-nuts] Re: Performance: Restrictions on arguments in registers in SSA implementation

2024-12-25 Thread Arseny Samoylov
to these questions, we want to be > significantly more conservative in how many registers we let a single > variable consume. > > All that said, I'm sure there are cases where we could do better. In your > example, those spills are either dead or kind of silly. > On