On Mon, Jun 29, 2015 at 12:33:41PM -0700, Andy Lutomirski wrote: > The current entry and exit code is incomprehensible, appears to work > primary by luck, and is very difficult to incrementally improve. Add > new code in preparation for simply deleting the old code. > > prepare_exit_to_usermode is a new function that will handle all slow > path exits to user mode. It is called with IRQs disabled and it > leaves us in a state in which it is safe to immediately return to > user mode. IRQs must not be re-enabled at any point after > prepare_exit_to_usermode returns and user mode is actually entered. > (We can, of course, fail to enter user mode and treat that failure > as a fresh entry to kernel mode.) All callers of do_notify_resume > will be migrated to call prepare_exit_to_usermode instead; > prepare_exit_to_usermode needs to do everything that > do_notify_resume does, but it also takes care of scheduling and > context tracking. Unlike do_notify_resume, it does not need to be > called in a loop. > > syscall_return_slowpath is exactly what it sounds like. It will be > called on any syscall exit slow path. It will replaces > syscall_trace_leave and it calls prepare_exit_to_usermode on the way > out. > > Signed-off-by: Andy Lutomirski <l...@kernel.org> > --- > arch/x86/entry/common.c | 112 > +++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 111 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > index 8a7e35af7164..55530d6dd1bd 100644 > --- a/arch/x86/entry/common.c > +++ b/arch/x86/entry/common.c > @@ -207,6 +207,7 @@ long syscall_trace_enter(struct pt_regs *regs) > return syscall_trace_enter_phase2(regs, arch, phase1_result); > } > > +/* Deprecated. */ > void syscall_trace_leave(struct pt_regs *regs)
Ah yes, this will get replaced later with syscall_return_slowpath below. > { > bool step; > @@ -237,8 +238,117 @@ void syscall_trace_leave(struct pt_regs *regs) > user_enter(); > } > > +static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs) > +{ > + unsigned long top_of_stack = > + (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING; > + return (struct thread_info *)(top_of_stack - THREAD_SIZE); > +} > + > +/* Called with IRQs disabled. */ > +__visible void prepare_exit_to_usermode(struct pt_regs *regs) > +{ > + if (WARN_ON(!irqs_disabled())) > + local_irq_disable(); > + > + /* > + * In order to return to user mode, we need to have IRQs off with > + * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY, > + * _TIF_UPROBE, or _TIF_NEED_RESCHED set. Several of these flags > + * can be set at any time on preemptable kernels if we have IRQs on, > + * so we need to loop. Disabling preemption wouldn't help: doing the > + * work to clear some of the flags can sleep. > + */ > + while (true) { > + u32 cached_flags = > + READ_ONCE(pt_regs_to_thread_info(regs)->flags); > + > + if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | > + _TIF_UPROBE | _TIF_NEED_RESCHED))) > + break; > + > + /* We have work to do. */ > + local_irq_enable(); > + > + if (cached_flags & _TIF_NEED_RESCHED) > + schedule(); > + > + if (cached_flags & _TIF_UPROBE) > + uprobe_notify_resume(regs); > + > + /* deal with pending signal delivery */ > + if (cached_flags & _TIF_SIGPENDING) > + do_signal(regs); > + > + if (cached_flags & _TIF_NOTIFY_RESUME) { > + clear_thread_flag(TIF_NOTIFY_RESUME); > + tracehook_notify_resume(regs); > + } > + > + if (cached_flags & _TIF_USER_RETURN_NOTIFY) > + fire_user_return_notifiers(); > + > + /* Disable IRQs and retry */ > + local_irq_disable(); > + } Stupid question: what assures us that we'll break out of this loop at some point? I.e., isn't the scenario possible of something always setting bits in ->flags while we're handling stuff in the IRQs on section? OTOH, this is what int_ret_from_sys_call() does now anyway so we should be fine. Yeah, it looks that way. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/