After converting arm64 to Generic Entry framework, the compiler no longer inlines el0_svc_common() into its caller do_el0_svc(). This introduces a small but measurable overhead in the critical system call path.
Manually forcing el0_svc_common() to be inlined restores the performance. Benchmarking with perf bench syscall basic on a Kunpeng 920 platform (based on v6.19-rc1) shows a ~1% performance uplift. Inlining this function reduces function prologue/epilogue overhead and allows for better compiler optimization in the hot system call dispatch path. | Metric | W/O this patch | With this patch | Change | | ---------- | -------------- | --------------- | --------- | | Total time | 2.195 [sec] | 2.171 [sec] | ↓1.1% | | usecs/op | 0.219575 | 0.217192 | ↓1.1% | | ops/sec | 4,554,260 | 4,604,225 | ↑1.1% | Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Cc: Catalin Marinas <[email protected]> Reviewed-by: Ada Couprie Diaz <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Reviewed-by: Yeoreum Yun <[email protected]> Reviewed-by: Kevin Brodsky <[email protected]> Signed-off-by: Jinjie Ruan <[email protected]> --- arch/arm64/kernel/syscall.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 41a3b70a9374..e0a98fac3b85 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -55,8 +55,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, syscall_set_return_value(current, regs, 0, ret); } -static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, - const syscall_fn_t syscall_table[]) +static __always_inline void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, + const syscall_fn_t syscall_table[]) { unsigned long flags = read_thread_flags(); unsigned long work; -- 2.34.1
