entry: Add new, comprehensible entry and exit hooks

Borislav Petkov Thu, 02 Jul 2015 02:49:33 -0700

On Mon, Jun 29, 2015 at 12:33:41PM -0700, Andy Lutomirski wrote:
> The current entry and exit code is incomprehensible, appears to work
> primary by luck, and is very difficult to incrementally improve.  Add
> new code in preparation for simply deleting the old code.
> 
> prepare_exit_to_usermode is a new function that will handle all slow
> path exits to user mode.  It is called with IRQs disabled and it
> leaves us in a state in which it is safe to immediately return to
> user mode.  IRQs must not be re-enabled at any point after
> prepare_exit_to_usermode returns and user mode is actually entered.
> (We can, of course, fail to enter user mode and treat that failure
> as a fresh entry to kernel mode.)  All callers of do_notify_resume
> will be migrated to call prepare_exit_to_usermode instead;
> prepare_exit_to_usermode needs to do everything that
> do_notify_resume does, but it also takes care of scheduling and
> context tracking.  Unlike do_notify_resume, it does not need to be
> called in a loop.
> 
> syscall_return_slowpath is exactly what it sounds like.  It will be
> called on any syscall exit slow path.  It will replaces
> syscall_trace_leave and it calls prepare_exit_to_usermode on the way
> out.
> 
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
>  arch/x86/entry/common.c | 112 
> +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 111 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 8a7e35af7164..55530d6dd1bd 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -207,6 +207,7 @@ long syscall_trace_enter(struct pt_regs *regs)
>               return syscall_trace_enter_phase2(regs, arch, phase1_result);
>  }
>  
> +/* Deprecated. */
>  void syscall_trace_leave(struct pt_regs *regs)


Ah yes, this will get replaced later with syscall_return_slowpath below.

>  {
>       bool step;
> @@ -237,8 +238,117 @@ void syscall_trace_leave(struct pt_regs *regs)
>       user_enter();
>  }
>  
> +static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
> +{
> +     unsigned long top_of_stack =
> +             (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING;
> +     return (struct thread_info *)(top_of_stack - THREAD_SIZE);
> +}
> +
> +/* Called with IRQs disabled. */
> +__visible void prepare_exit_to_usermode(struct pt_regs *regs)
> +{
> +     if (WARN_ON(!irqs_disabled()))
> +             local_irq_disable();
> +
> +     /*
> +      * In order to return to user mode, we need to have IRQs off with
> +      * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY,
> +      * _TIF_UPROBE, or _TIF_NEED_RESCHED set.  Several of these flags
> +      * can be set at any time on preemptable kernels if we have IRQs on,
> +      * so we need to loop.  Disabling preemption wouldn't help: doing the
> +      * work to clear some of the flags can sleep.
> +      */
> +     while (true) {
> +             u32 cached_flags =
> +                     READ_ONCE(pt_regs_to_thread_info(regs)->flags);
> +
> +             if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
> +                                   _TIF_UPROBE | _TIF_NEED_RESCHED)))
> +                     break;
> +
> +             /* We have work to do. */
> +             local_irq_enable();
> +
> +             if (cached_flags & _TIF_NEED_RESCHED)
> +                     schedule();
> +
> +             if (cached_flags & _TIF_UPROBE)
> +                     uprobe_notify_resume(regs);
> +
> +             /* deal with pending signal delivery */
> +             if (cached_flags & _TIF_SIGPENDING)
> +                     do_signal(regs);
> +
> +             if (cached_flags & _TIF_NOTIFY_RESUME) {
> +                     clear_thread_flag(TIF_NOTIFY_RESUME);
> +                     tracehook_notify_resume(regs);
> +             }
> +
> +             if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> +                     fire_user_return_notifiers();
> +
> +             /* Disable IRQs and retry */
> +             local_irq_disable();
> +     }

Stupid question: what assures us that we'll break out of this loop
at some point? I.e., isn't the scenario possible of something always
setting bits in ->flags while we're handling stuff in the IRQs on
section?

OTOH, this is what int_ret_from_sys_call() does now anyway so we should
be fine.

Yeah, it looks that way.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 09/17] x86/entry: Add new, comprehensible entry and exit hooks

Reply via email to