(your mailer broke and forgot to keep lines shorter than 78 chars)
On Tue, Sep 01, 2020 at 12:46:41PM +0200, Frederic Weisbecker wrote: > == TIF_NOHZ == > > Need to get rid of that in order not to trigger syscall slowpath on > CPUs that don't want nohz_full. Also we don't want to iterate all > threads and clear the flag when the last nohz_full CPU exits nohz_full > mode. Prefer static keys to call context tracking on archs. x86 does > that well. Build on the common entry code I suppose. Then any arch that uses that gets to have the new features. > == Proper entry code == > > We must make sure that a given arch never calls exception_enter() / > exception_exit(). This saves the previous state of context tracking > and switch to kernel mode (from context tracking POV) temporarily. > Since this state is saved on the stack, this prevents us from turning > off context tracking entirely on a CPU: The tracking must be done on > all CPUs and that takes some cycles. > > This means that, considering early entry code (before the call to > context tracking upon kernel entry, and after the call to context > tracking upon kernel exit), we must take care of few things: > > 1) Make sure early entry code can't trigger exceptions. Or if it does, > the given exception can't schedule or use RCU (unless it calls > rcu_nmi_enter()). Otherwise the exception must call > exception_enter()/exception_exit() which we don't want. I think this is true for x86. Early entry has interrupts disabled, any exception that can still happen is NMI-like and will thus use rcu_nmi_enter(). On x86 that now includes #DB (which is also excluded due to us refusing to set execution breakpoints on entry code), #BP, NMI and MCE. > 2) No call to schedule_user(). I'm not sure what that is supposed to do, but x86 doesn't appear to have it, so all good :-) > 3) Make sure early entry code is not interruptible or > preempt_schedule_irq() would rely on > exception_entry()/exception_exit() This is so for x86. > 4) Make sure early entry code can't be traced (no call to > preempt_schedule_notrace()), or if it does it can't schedule noinstr is your friend. > I believe x86 does most of that well. It does now.