On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote:
> 
> The current idiom for the callers is:
> 
> flush_old_exec(bprm);
> set_personality(...);
> setup_new_exec(bprm);
> 
> In 2010 Linus split flush_old_exec into flush_old_exec and
> setup_new_exec.  With the intention that setup_new_exec be what is
> called after the processes new personality is set.
> 
> Move the code that doesn't depend upon the personality from
> setup_new_exec into flush_old_exec.  This is to facilitate future
> changes by having as much code together in one function as possible.

Er, I *think* this is okay, but I have some questions below which
maybe you already investigated (and should perhaps get called out in
the changelog).

> 
> Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions")
> Signed-off-by: "Eric W. Biederman" <ebied...@xmission.com>
> ---
>  fs/exec.c | 85 ++++++++++++++++++++++++++++---------------------------
>  1 file changed, 44 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 8c3abafb9bb1..0eff20558735 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm)
>        * undergoing exec(2).
>        */
>       do_close_on_exec(me->files);
> -     return 0;
> -
> -out_unlock:
> -     mutex_unlock(&me->signal->exec_update_mutex);
> -out:
> -     return retval;
> -}
> -EXPORT_SYMBOL(flush_old_exec);
> -
> -void would_dump(struct linux_binprm *bprm, struct file *file)
> -{
> -     struct inode *inode = file_inode(file);
> -     if (inode_permission(inode, MAY_READ) < 0) {
> -             struct user_namespace *old, *user_ns;
> -             bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
> -
> -             /* Ensure mm->user_ns contains the executable */
> -             user_ns = old = bprm->mm->user_ns;
> -             while ((user_ns != &init_user_ns) &&
> -                    !privileged_wrt_inode_uidgid(user_ns, inode))
> -                     user_ns = user_ns->parent;
>  
> -             if (old != user_ns) {
> -                     bprm->mm->user_ns = get_user_ns(user_ns);
> -                     put_user_ns(old);
> -             }
> -     }
> -}
> -EXPORT_SYMBOL(would_dump);
> -
> -void setup_new_exec(struct linux_binprm * bprm)
> -{
> -     struct task_struct *me = current;
>       /*
>        * Once here, prepare_binrpm() will not be called any more, so
>        * the final state of setuid/setgid/fscaps can be merged into the
> @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm)
>                       bprm->rlim_stack.rlim_cur = _STK_LIM;
>       }
>  
> -     arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
> -
>       me->sas_ss_sp = me->sas_ss_size = 0;
>  
>       /*
> @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm)
>       else
>               set_dumpable(current->mm, SUID_DUMP_USER);
>  
> -     arch_setup_new_exec();
>       perf_event_exec();

What is perf expecting to be able to examine at this point? Does it want
a view of things after arch_setup_new_exec()? (i.e. "final" TIF flags,
mmap layout, etc.) From what I can, the answer is "no, it's just
resetting counters", so I think this is fine. Maybe double-check with
Steve?

>       __set_task_comm(me, kbasename(bprm->filename), true);
>  
> -     /* Set the new mm task size. We have to do that late because it may
> -      * depend on TIF_32BIT which is only updated in flush_thread() on
> -      * some architectures like powerpc
> -      */
> -     me->mm->task_size = TASK_SIZE;
> -
>       /* An exec changes our domain. We are no longer part of the thread
>          group */
>       WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1);
> @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm)
>        * credentials; any time after this it may be unlocked.
>        */
>       security_bprm_committed_creds(bprm);

Similarly for the LSM hook: is it expecting a post-arch-setup view? I
don't see anything looking at task_size, TIF flags, or anything else;
they seem to be just cleaning up from the old process being replaced, so
against, I think this is okay.

Not visible in this patch, the following things how happen earlier,
which I feel should maybe get called out in the changelog, with,
perhaps, better justification than what I've got here:

bprm->secureexec set/check (looks safe, since it depends on
prepare_binprm()'s security_bprm_set_creds().

rlim_stack.rlim_cur setting (safe, just needs to happen before
arch_pick_mmap_layout())

dumpable() check (looks safe, BINPRM_FLAGS_ENFORCE_NONDUMP depends on
much earlier would_dump(), and uid/gid depend on earlier calls to
prepare_binprm()'s bprm_fill_uid())

__set_task_comm (looks safe, just dealing with the task name...)

self_exec_id bump (looks safe, but I think -- it's still after uid
setting)

flush_signal_handlers() (looks safe -- nothing appears to depend on mm
nor personality)

> +     return 0;
> +
> +out_unlock:
> +     mutex_unlock(&me->signal->exec_update_mutex);
> +out:
> +     return retval;
> +}
> +EXPORT_SYMBOL(flush_old_exec);
> +
> +void would_dump(struct linux_binprm *bprm, struct file *file)
> +{
> +     struct inode *inode = file_inode(file);
> +     if (inode_permission(inode, MAY_READ) < 0) {
> +             struct user_namespace *old, *user_ns;
> +             bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
> +
> +             /* Ensure mm->user_ns contains the executable */
> +             user_ns = old = bprm->mm->user_ns;
> +             while ((user_ns != &init_user_ns) &&
> +                    !privileged_wrt_inode_uidgid(user_ns, inode))
> +                     user_ns = user_ns->parent;
> +
> +             if (old != user_ns) {
> +                     bprm->mm->user_ns = get_user_ns(user_ns);
> +                     put_user_ns(old);
> +             }
> +     }
> +}
> +EXPORT_SYMBOL(would_dump);

The diff helpfully decided this moved would_dump(). ;) Is it worth
maybe just moviing it explicitly above flush_old_exec() to avoid this
churn? I dunno.

> +
> +void setup_new_exec(struct linux_binprm * bprm)
> +{
> +     /* Setup things that can depend upon the personality */

Should this comment be above the function instead?

> +     struct task_struct *me = current;
> +
> +     arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
> +
> +     arch_setup_new_exec();
> +
> +     /* Set the new mm task size. We have to do that late because it may
> +      * depend on TIF_32BIT which is only updated in flush_thread() on
> +      * some architectures like powerpc
> +      */
> +     me->mm->task_size = TASK_SIZE;
>       mutex_unlock(&me->signal->exec_update_mutex);
>       mutex_unlock(&me->signal->cred_guard_mutex);
>  }
> -- 
> 2.20.1
> 

So, as I say, I *think* this is okay, but I always get suspicious about
reordering things in execve(). ;)

So, with a bit larger changelog discussing what's moving "earlier",
I think this looks good:

Reviewed-by: Kees Cook <keesc...@chromium.org>

-- 
Kees Cook

Reply via email to