Christophe Leroy's on March 19, 2020 7:18 pm:
> 
> 
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> System call entry and particularly exit code is beyond the limit of what
>> is reasonable to implement in asm.
>> 
>> This conversion moves all conditional branches out of the asm code,
>> except for the case that all GPRs should be restored at exit.
>> 
>> Null syscall test is about 5% faster after this patch, because the exit
>> work is handled under local_irq_disable, and the hard mask and pending
>> interrupt replay is handled after that, which avoids games with MSR.
>> 
>> Signed-off-by: Nicholas Piggin <npig...@gmail.com>
>> Signed-off-by: Michal Suchanek <msucha...@suse.de>
>> ---
>> 
>> v2,rebase (from Michal):
>> - Add endian conversion for dtl_idx (ms)
>> - Fix sparse warning about missing declaration (ms)
>> - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe)
>> 
>> v3: Fixes thanks to reports from mpe and selftests errors:
>> - Several soft-mask debug and unsafe smp_processor_id() warnings due to
>>    tracing and other false positives due to checks in "unreconciled" code.
>> - Fix a bug with syscall tracing functions that set registers (e.g.,
>>    PTRACE_SETREG) not setting GPRs properly.
>> - Fix silly tabort_syscall bug that causes kernel crashes when making system
>>    calls in transactional state.
>> 
>>   arch/powerpc/include/asm/asm-prototypes.h     |  17 +-
>>   .../powerpc/include/asm/book3s/64/kup-radix.h |  14 +-
>>   arch/powerpc/include/asm/cputime.h            |  29 ++
>>   arch/powerpc/include/asm/hw_irq.h             |   4 +
>>   arch/powerpc/include/asm/ptrace.h             |   3 +
>>   arch/powerpc/include/asm/signal.h             |   3 +
>>   arch/powerpc/include/asm/switch_to.h          |   5 +
>>   arch/powerpc/include/asm/time.h               |   3 +
>>   arch/powerpc/kernel/Makefile                  |   3 +-
>>   arch/powerpc/kernel/entry_64.S                | 338 +++---------------
>>   arch/powerpc/kernel/signal.h                  |   2 -
>>   arch/powerpc/kernel/syscall_64.c              | 213 +++++++++++
>>   arch/powerpc/kernel/systbl.S                  |   9 +-
>>   13 files changed, 328 insertions(+), 315 deletions(-)
>>   create mode 100644 arch/powerpc/kernel/syscall_64.c
>> 
>> diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
>> b/arch/powerpc/include/asm/asm-prototypes.h
>> index 983c0084fb3f..4b3609554e76 100644
>> --- a/arch/powerpc/include/asm/asm-prototypes.h
>> +++ b/arch/powerpc/include/asm/asm-prototypes.h
>> @@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user 
>> *outp, fd_set __user *exp,
>>   unsigned long __init early_init(unsigned long dt_ptr);
>>   void __init machine_init(u64 dt_ptr);
>>   #endif
>> +#ifdef CONFIG_PPC64
> 
> This ifdef is not necessary as it has no pending #else.
> Having function declaration without definition is not an issue.
> Keeping in mind that we are aiming at generalising this to PPC32.

Well there's other unnecessary ifdefs in there too I think. But sure.
This patch also got the interrupt_exit_ prototypes leaked in from the
later patch so I could fix those.

>> diff --git a/arch/powerpc/include/asm/cputime.h 
>> b/arch/powerpc/include/asm/cputime.h
>> index 2431b4ada2fa..6639a6847cc0 100644
>> --- a/arch/powerpc/include/asm/cputime.h
>> +++ b/arch/powerpc/include/asm/cputime.h
>> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const 
>> cputime_t ct)
>>   #ifdef CONFIG_PPC64
>>   #define get_accounting(tsk)        (&get_paca()->accounting)
>>   static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> 
> Could we have the below additions sit outside of this PPC64 ifdef, to be 
> reused on PPC32 ?

Okay.

>> +
>> +/*
>> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
>> + * can't use use get_paca()
>> + */
>> +static notrace inline void account_cpu_user_entry(void)
>> +{
>> +    unsigned long tb = mftb();
>> +    struct cpu_accounting_data *acct = &local_paca->accounting;
> 
> In the spirit of reusing that code on PPC32, can we use get_accounting() 
> ? Or an alternate version of get_accounting(), eg 
> get_accounting_notrace() to be defined ?

Okay.

>> diff --git a/arch/powerpc/kernel/syscall_64.c 
>> b/arch/powerpc/kernel/syscall_64.c
> 
> Could some part of it go in a syscall.c to be reused on PPC32 ?

I could put it all in syscall.c and then we can adjust with some ifdefs
or helpers. I don't think there is enough to be worth syscall.c,
syscall_32.c, and syscall_64.c.

I wonder about the interrupt returns as well, that doesn't really make
sense in a file called syscall.c, but the code is very similar to
system call exit. Should we just call it interrupts.c?

>> +    /*
>> +     * This is not required for the syscall exit path, but makes the
>> +     * stack frame look nicer. If this was initialised in the first stack
>> +     * frame, or if the unwinder was taught the first stack frame always
>> +     * returns to user with IRQS_ENABLED, this store could be avoided!
>> +     */
>> +    regs->softe = IRQS_ENABLED;
> 
> softe doesn't exist on PPC32. Can we do that through a helper ?

I guess, we can have regs_set_irq_state(regs, IRQS_ENABLED); or
something like that.

We make that helper and a _get_ counterpart in a later patch which 
covers other cases in the tree as well.

>> +
>> +    __hard_irq_enable();
> 
> This doesn't exist on PPC32. Should we define __hard_irq_enable() as 
> arch_local_irq_enable() on PPC32 ?

This goes away with patch 29. Better not to have this ugly thing
spill into ppc32 code at all if we can avoid it :)

> 
>> +
>> +    ti_flags = current_thread_info()->flags;
>> +    if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
>> +            /*
>> +             * We use the return value of do_syscall_trace_enter() as the
>> +             * syscall number. If the syscall was rejected for any reason
>> +             * do_syscall_trace_enter() returns an invalid syscall number
>> +             * and the test against NR_syscalls will fail and the return
>> +             * value to be used is in regs->gpr[3].
>> +             */
>> +            r0 = do_syscall_trace_enter(regs);
>> +            if (unlikely(r0 >= NR_syscalls))
>> +                    return regs->gpr[3];
>> +            r3 = regs->gpr[3];
>> +            r4 = regs->gpr[4];
>> +            r5 = regs->gpr[5];
>> +            r6 = regs->gpr[6];
>> +            r7 = regs->gpr[7];
>> +            r8 = regs->gpr[8];
>> +
>> +    } else if (unlikely(r0 >= NR_syscalls)) {
>> +            return -ENOSYS;
>> +    }
>> +
>> +    /* May be faster to do array_index_nospec? */
>> +    barrier_nospec();
>> +
>> +    if (unlikely(ti_flags & _TIF_32BIT)) {
> 
> Use is_compat_task() instead ?

Michal pointed this out, he's got patches that do this on top of this
series.

Incremental diff for your suggestions below. Now there is likely we're
going to have a few ifdefs, particularly in the exit paths where we have
complexity handling irq soft masked state where helpers dont make much
sense. I don't think that will be such a bad thing, but we can come to
it as we go.

Thanks,
Nick

---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ---
 arch/powerpc/include/asm/cputime.h        | 38 +++++++++++++----------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 4b3609554e76..ab59a4904254 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -97,12 +97,8 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, 
fd_set __user *exp,
 unsigned long __init early_init(unsigned long dt_ptr);
 void __init machine_init(u64 dt_ptr);
 #endif
-#ifdef CONFIG_PPC64
 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
r8, unsigned long r0, struct pt_regs *regs);
 notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs 
*regs);
-notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, 
unsigned long msr);
-notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, 
unsigned long msr);
-#endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
                      u32 len_high, u32 len_low);
diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index 6639a6847cc0..0fccd5ea1e9a 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -43,8 +43,26 @@ static inline unsigned long cputime_to_usecs(const cputime_t 
ct)
  */
 #ifdef CONFIG_PPC64
 #define get_accounting(tsk)    (&get_paca()->accounting)
+#define raw_get_accounting(tsk)        (&local_paca->accounting)
 static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
 
+#else
+#define get_accounting(tsk)    (&task_thread_info(tsk)->accounting)
+#define raw_get_accounting(tsk)        get_accounting(tsk)
+/*
+ * Called from the context switch with interrupts disabled, to charge all
+ * accumulated times to the current process, and to prepare accounting on
+ * the next process.
+ */
+static inline void arch_vtime_task_switch(struct task_struct *prev)
+{
+       struct cpu_accounting_data *acct = get_accounting(current);
+       struct cpu_accounting_data *acct0 = get_accounting(prev);
+
+       acct->starttime = acct0->starttime;
+}
+#endif
+
 /*
  * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
  * can't use use get_paca()
@@ -52,35 +70,21 @@ static inline void arch_vtime_task_switch(struct 
task_struct *tsk) { }
 static notrace inline void account_cpu_user_entry(void)
 {
        unsigned long tb = mftb();
-       struct cpu_accounting_data *acct = &local_paca->accounting;
+       struct cpu_accounting_data *acct = raw_get_accounting(current);
 
        acct->utime += (tb - acct->starttime_user);
        acct->starttime = tb;
 }
+
 static notrace inline void account_cpu_user_exit(void)
 {
        unsigned long tb = mftb();
-       struct cpu_accounting_data *acct = &local_paca->accounting;
+       struct cpu_accounting_data *acct = raw_get_accounting(current);
 
        acct->stime += (tb - acct->starttime);
        acct->starttime_user = tb;
 }
 
-#else
-#define get_accounting(tsk)    (&task_thread_info(tsk)->accounting)
-/*
- * Called from the context switch with interrupts disabled, to charge all
- * accumulated times to the current process, and to prepare accounting on
- * the next process.
- */
-static inline void arch_vtime_task_switch(struct task_struct *prev)
-{
-       struct cpu_accounting_data *acct = get_accounting(current);
-       struct cpu_accounting_data *acct0 = get_accounting(prev);
-
-       acct->starttime = acct0->starttime;
-}
-#endif
 
 #endif /* __KERNEL__ */
 #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
-- 
2.23.0

Reply via email to