Christophe Leroy's on March 19, 2020 7:18 pm: > > > Le 25/02/2020 à 18:35, Nicholas Piggin a écrit : >> System call entry and particularly exit code is beyond the limit of what >> is reasonable to implement in asm. >> >> This conversion moves all conditional branches out of the asm code, >> except for the case that all GPRs should be restored at exit. >> >> Null syscall test is about 5% faster after this patch, because the exit >> work is handled under local_irq_disable, and the hard mask and pending >> interrupt replay is handled after that, which avoids games with MSR. >> >> Signed-off-by: Nicholas Piggin <npig...@gmail.com> >> Signed-off-by: Michal Suchanek <msucha...@suse.de> >> --- >> >> v2,rebase (from Michal): >> - Add endian conversion for dtl_idx (ms) >> - Fix sparse warning about missing declaration (ms) >> - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe) >> >> v3: Fixes thanks to reports from mpe and selftests errors: >> - Several soft-mask debug and unsafe smp_processor_id() warnings due to >> tracing and other false positives due to checks in "unreconciled" code. >> - Fix a bug with syscall tracing functions that set registers (e.g., >> PTRACE_SETREG) not setting GPRs properly. >> - Fix silly tabort_syscall bug that causes kernel crashes when making system >> calls in transactional state. >> >> arch/powerpc/include/asm/asm-prototypes.h | 17 +- >> .../powerpc/include/asm/book3s/64/kup-radix.h | 14 +- >> arch/powerpc/include/asm/cputime.h | 29 ++ >> arch/powerpc/include/asm/hw_irq.h | 4 + >> arch/powerpc/include/asm/ptrace.h | 3 + >> arch/powerpc/include/asm/signal.h | 3 + >> arch/powerpc/include/asm/switch_to.h | 5 + >> arch/powerpc/include/asm/time.h | 3 + >> arch/powerpc/kernel/Makefile | 3 +- >> arch/powerpc/kernel/entry_64.S | 338 +++--------------- >> arch/powerpc/kernel/signal.h | 2 - >> arch/powerpc/kernel/syscall_64.c | 213 +++++++++++ >> arch/powerpc/kernel/systbl.S | 9 +- >> 13 files changed, 328 insertions(+), 315 deletions(-) >> create mode 100644 arch/powerpc/kernel/syscall_64.c >> >> diff --git a/arch/powerpc/include/asm/asm-prototypes.h >> b/arch/powerpc/include/asm/asm-prototypes.h >> index 983c0084fb3f..4b3609554e76 100644 >> --- a/arch/powerpc/include/asm/asm-prototypes.h >> +++ b/arch/powerpc/include/asm/asm-prototypes.h >> @@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user >> *outp, fd_set __user *exp, >> unsigned long __init early_init(unsigned long dt_ptr); >> void __init machine_init(u64 dt_ptr); >> #endif >> +#ifdef CONFIG_PPC64 > > This ifdef is not necessary as it has no pending #else. > Having function declaration without definition is not an issue. > Keeping in mind that we are aiming at generalising this to PPC32.
Well there's other unnecessary ifdefs in there too I think. But sure. This patch also got the interrupt_exit_ prototypes leaked in from the later patch so I could fix those. >> diff --git a/arch/powerpc/include/asm/cputime.h >> b/arch/powerpc/include/asm/cputime.h >> index 2431b4ada2fa..6639a6847cc0 100644 >> --- a/arch/powerpc/include/asm/cputime.h >> +++ b/arch/powerpc/include/asm/cputime.h >> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const >> cputime_t ct) >> #ifdef CONFIG_PPC64 >> #define get_accounting(tsk) (&get_paca()->accounting) >> static inline void arch_vtime_task_switch(struct task_struct *tsk) { } > > Could we have the below additions sit outside of this PPC64 ifdef, to be > reused on PPC32 ? Okay. >> + >> +/* >> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace, >> + * can't use use get_paca() >> + */ >> +static notrace inline void account_cpu_user_entry(void) >> +{ >> + unsigned long tb = mftb(); >> + struct cpu_accounting_data *acct = &local_paca->accounting; > > In the spirit of reusing that code on PPC32, can we use get_accounting() > ? Or an alternate version of get_accounting(), eg > get_accounting_notrace() to be defined ? Okay. >> diff --git a/arch/powerpc/kernel/syscall_64.c >> b/arch/powerpc/kernel/syscall_64.c > > Could some part of it go in a syscall.c to be reused on PPC32 ? I could put it all in syscall.c and then we can adjust with some ifdefs or helpers. I don't think there is enough to be worth syscall.c, syscall_32.c, and syscall_64.c. I wonder about the interrupt returns as well, that doesn't really make sense in a file called syscall.c, but the code is very similar to system call exit. Should we just call it interrupts.c? >> + /* >> + * This is not required for the syscall exit path, but makes the >> + * stack frame look nicer. If this was initialised in the first stack >> + * frame, or if the unwinder was taught the first stack frame always >> + * returns to user with IRQS_ENABLED, this store could be avoided! >> + */ >> + regs->softe = IRQS_ENABLED; > > softe doesn't exist on PPC32. Can we do that through a helper ? I guess, we can have regs_set_irq_state(regs, IRQS_ENABLED); or something like that. We make that helper and a _get_ counterpart in a later patch which covers other cases in the tree as well. >> + >> + __hard_irq_enable(); > > This doesn't exist on PPC32. Should we define __hard_irq_enable() as > arch_local_irq_enable() on PPC32 ? This goes away with patch 29. Better not to have this ugly thing spill into ppc32 code at all if we can avoid it :) > >> + >> + ti_flags = current_thread_info()->flags; >> + if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) { >> + /* >> + * We use the return value of do_syscall_trace_enter() as the >> + * syscall number. If the syscall was rejected for any reason >> + * do_syscall_trace_enter() returns an invalid syscall number >> + * and the test against NR_syscalls will fail and the return >> + * value to be used is in regs->gpr[3]. >> + */ >> + r0 = do_syscall_trace_enter(regs); >> + if (unlikely(r0 >= NR_syscalls)) >> + return regs->gpr[3]; >> + r3 = regs->gpr[3]; >> + r4 = regs->gpr[4]; >> + r5 = regs->gpr[5]; >> + r6 = regs->gpr[6]; >> + r7 = regs->gpr[7]; >> + r8 = regs->gpr[8]; >> + >> + } else if (unlikely(r0 >= NR_syscalls)) { >> + return -ENOSYS; >> + } >> + >> + /* May be faster to do array_index_nospec? */ >> + barrier_nospec(); >> + >> + if (unlikely(ti_flags & _TIF_32BIT)) { > > Use is_compat_task() instead ? Michal pointed this out, he's got patches that do this on top of this series. Incremental diff for your suggestions below. Now there is likely we're going to have a few ifdefs, particularly in the exit paths where we have complexity handling irq soft masked state where helpers dont make much sense. I don't think that will be such a bad thing, but we can come to it as we go. Thanks, Nick --- arch/powerpc/include/asm/asm-prototypes.h | 4 --- arch/powerpc/include/asm/cputime.h | 38 +++++++++++++---------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 4b3609554e76..ab59a4904254 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -97,12 +97,8 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, unsigned long __init early_init(unsigned long dt_ptr); void __init machine_init(u64 dt_ptr); #endif -#ifdef CONFIG_PPC64 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs); notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs); -notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr); -notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr); -#endif long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low, u32 len_high, u32 len_low); diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index 6639a6847cc0..0fccd5ea1e9a 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -43,8 +43,26 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct) */ #ifdef CONFIG_PPC64 #define get_accounting(tsk) (&get_paca()->accounting) +#define raw_get_accounting(tsk) (&local_paca->accounting) static inline void arch_vtime_task_switch(struct task_struct *tsk) { } +#else +#define get_accounting(tsk) (&task_thread_info(tsk)->accounting) +#define raw_get_accounting(tsk) get_accounting(tsk) +/* + * Called from the context switch with interrupts disabled, to charge all + * accumulated times to the current process, and to prepare accounting on + * the next process. + */ +static inline void arch_vtime_task_switch(struct task_struct *prev) +{ + struct cpu_accounting_data *acct = get_accounting(current); + struct cpu_accounting_data *acct0 = get_accounting(prev); + + acct->starttime = acct0->starttime; +} +#endif + /* * account_cpu_user_entry/exit runs "unreconciled", so can't trace, * can't use use get_paca() @@ -52,35 +70,21 @@ static inline void arch_vtime_task_switch(struct task_struct *tsk) { } static notrace inline void account_cpu_user_entry(void) { unsigned long tb = mftb(); - struct cpu_accounting_data *acct = &local_paca->accounting; + struct cpu_accounting_data *acct = raw_get_accounting(current); acct->utime += (tb - acct->starttime_user); acct->starttime = tb; } + static notrace inline void account_cpu_user_exit(void) { unsigned long tb = mftb(); - struct cpu_accounting_data *acct = &local_paca->accounting; + struct cpu_accounting_data *acct = raw_get_accounting(current); acct->stime += (tb - acct->starttime); acct->starttime_user = tb; } -#else -#define get_accounting(tsk) (&task_thread_info(tsk)->accounting) -/* - * Called from the context switch with interrupts disabled, to charge all - * accumulated times to the current process, and to prepare accounting on - * the next process. - */ -static inline void arch_vtime_task_switch(struct task_struct *prev) -{ - struct cpu_accounting_data *acct = get_accounting(current); - struct cpu_accounting_data *acct0 = get_accounting(prev); - - acct->starttime = acct0->starttime; -} -#endif #endif /* __KERNEL__ */ #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ -- 2.23.0