Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case

2024-11-30 Thread Christian Brauner
On Fri, Nov 29, 2024 at 08:54:38PM -0800, Kees Cook wrote:
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the
> dentry's filename for "comm" instead of using the useless numeral from
> the synthetic fdpath construction. This way the actual exec machinery
> is unchanged, but cosmetically the comm looks reasonable to admins
> investigating things.
> 
> Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused
> flag bits to indicate that we need to set "comm" from the dentry.
> 
> Suggested-by: Zbigniew Jędrzejewski-Szmek 
> Suggested-by: Tycho Andersen 
> Suggested-by: Al Viro 
> Suggested-by: Linus Torvalds 
> CC: Aleksa Sarai 
> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> Signed-off-by: Kees Cook 
> ---
> Cc: Al Viro 
> Cc: Linus Torvalds 
> Cc: Eric Biederman 
> Cc: Alexander Viro 
> Cc: Christian Brauner 
> Cc: Jan Kara 
> Cc: linux...@kvack.org
> Cc: linux-fsde...@vger.kernel.org
> 
> Here's what I've put together from the various suggestions. I didn't
> want to needlessly grow bprm, so I just added a flag instead. Otherwise,
> this is very similar to what Linus and Al suggested.
> ---
>  fs/exec.c   | 22 +++---
>  include/linux/binfmts.h |  4 +++-
>  2 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5f16500ac325..d897d60ca5c2 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
>   set_dumpable(current->mm, SUID_DUMP_USER);
>  
>   perf_event_exec();
> - __set_task_comm(me, kbasename(bprm->filename), true);
> +
> + /*
> +  * If the original filename was empty, alloc_bprm() made up a path
> +  * that will probably not be useful to admins running ps or similar.
> +  * Let's fix it up to be something reasonable.
> +  */
> + if (bprm->comm_from_dentry) {
> + rcu_read_lock();
> + /* The dentry name won't change while we hold the rcu read 
> lock. */
> + __set_task_comm(me, 
> smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),

What does the smp_load_acquire() pair with?



Re: [PATCH v2] printf: Remove unused 'bprintf'

2024-11-30 Thread Steven Rostedt
On Sun, 1 Dec 2024 02:47:49 +
"Dr. David Alan Gilbert"  wrote:

> > > suggested at 
> > > https://lore.kernel.org/r/20241002104807.42b4b...@gandalf.local.home  
> > 
> > Yeah, since I'm basically the only user of it, it's best it goes through my
> > testing.  
> 
> Hmm, did you pick this one up?

Sorry, it wasn't in my patchwork and was missed. I can add it and Linus
may even take it after -rc1. Or I'll have it in iinux-next for the next
merge window.

-- Steve



Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case

2024-11-30 Thread Linus Torvalds
On Sat, 30 Nov 2024 at 12:28, Mateusz Guzik  wrote:
>
> > + /* The dentry name won't change while we hold the rcu read 
> > lock. */
> > + __set_task_comm(me, 
> > smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> > + true);
>
> This does not sound legit whatsoever as it would indicate all renames
> wait for rcu grace periods to end, which would be prettye weird.

Yes, the "won't change" should be "won't go away from under us".

  Linus



Re: [PATCH] exec: Make sure task->comm is always NUL-terminated

2024-11-30 Thread Linus Torvalds
On Sat, 30 Nov 2024 at 13:05, Kees Cook  wrote:
>
> Yeah, this just means it has greater potential to be garbled.

Garbled is fine. Id' just rather it be "consistently padded".

> This is fine, but it doesn't solve either an unstable source nor
> concurrent writers to dest.

Yeah, I guess concurrent writers will also cause possibly inconsistent padding.

Maybe we just don't care. As long as it's NUL-terminated, it's a
string. If somebody is messing with the kernel, they get to the
garbled string parts.

   Linus



RE: [PATCH] exec: Make sure task->comm is always NUL-terminated

2024-11-30 Thread David Laight
From: Kees Cook
> Sent: 30 November 2024 04:49
>
> Instead of adding a new use of the ambiguous strncpy(), we'd want to
> use memtostr_pad() which enforces being able to check at compile time
> that sizes are sensible, but this requires being able to see string
> buffer lengths. Instead of trying to inline __set_task_comm() (which
> needs to call trace and perf functions), just open-code it. But to
> make sure we're always safe, add compile-time checking like we already
> do for get_task_comm().
...
> Here's what I'd prefer to use to clean up set_task_comm(). I merged
> Linus and Eric's suggestions and open-coded memtostr_pad().
> ---
>  fs/exec.c | 12 ++--
>  include/linux/sched.h |  9 -
>  io_uring/io-wq.c  |  2 +-
>  io_uring/sqpoll.c |  2 +-
>  kernel/kthread.c  |  3 ++-
>  5 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index e0435b31a811..5f16500ac325 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1200,16 +1200,16 @@ char *__get_task_comm(char *buf, size_t buf_size, 
> struct task_struct *tsk)
>  EXPORT_SYMBOL_GPL(__get_task_comm);
> 
>  /*
> - * These functions flushes out all traces of the currently running executable
> - * so that a new one can be started
> + * This is unlocked -- the string will always be NUL-terminated, but
> + * may show overlapping contents if racing concurrent reads.
>   */
> -
>  void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
>  {
> - task_lock(tsk);
> + size_t len = min(strlen(buf), sizeof(tsk->comm) - 1);
> +
>   trace_task_rename(tsk, buf);
> - strscpy_pad(tsk->comm, buf, sizeof(tsk->comm));
> - task_unlock(tsk);
> + memcpy(tsk->comm, buf, len);
> + memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len);
>   perf_event_comm(tsk, exec);

Why not do strscpy_pad() into a local char[16] and then do a 16 byte
memcpy() into the target buffer?

Then non-constant input data will always give a valid '\0' terminated string
regardless of how strscpy_pad() is implemented.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case

2024-11-30 Thread Mateusz Guzik
On Fri, Nov 29, 2024 at 08:54:38PM -0800, Kees Cook wrote:
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the
> dentry's filename for "comm" instead of using the useless numeral from
> the synthetic fdpath construction. This way the actual exec machinery
> is unchanged, but cosmetically the comm looks reasonable to admins
> investigating things.
> 
> Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused
> flag bits to indicate that we need to set "comm" from the dentry.
> 
> Suggested-by: Zbigniew Jędrzejewski-Szmek 
> Suggested-by: Tycho Andersen 
> Suggested-by: Al Viro 
> Suggested-by: Linus Torvalds 
> CC: Aleksa Sarai 
> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> Signed-off-by: Kees Cook 
> ---
> Cc: Al Viro 
> Cc: Linus Torvalds 
> Cc: Eric Biederman 
> Cc: Alexander Viro 
> Cc: Christian Brauner 
> Cc: Jan Kara 
> Cc: linux...@kvack.org
> Cc: linux-fsde...@vger.kernel.org
> 
> Here's what I've put together from the various suggestions. I didn't
> want to needlessly grow bprm, so I just added a flag instead. Otherwise,
> this is very similar to what Linus and Al suggested.
> ---
>  fs/exec.c   | 22 +++---
>  include/linux/binfmts.h |  4 +++-
>  2 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5f16500ac325..d897d60ca5c2 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
>   set_dumpable(current->mm, SUID_DUMP_USER);
>  
>   perf_event_exec();
> - __set_task_comm(me, kbasename(bprm->filename), true);
> +
> + /*
> +  * If the original filename was empty, alloc_bprm() made up a path
> +  * that will probably not be useful to admins running ps or similar.
> +  * Let's fix it up to be something reasonable.
> +  */
> + if (bprm->comm_from_dentry) {
> + rcu_read_lock();
> + /* The dentry name won't change while we hold the rcu read 
> lock. */
> + __set_task_comm(me, 
> smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> + true);

This does not sound legit whatsoever as it would indicate all renames
wait for rcu grace periods to end, which would be prettye weird.

Even commentary above dentry_cmp states:
 * Be careful about RCU walk racing with rename:
 * use 'READ_ONCE' to fetch the name pointer.
 *
 * NOTE! Even if a rename will mean that the length
 * was not loaded atomically, we don't care.

It may be this is considered tolerable, but there should be no
difficulty getting a real name there?

Regardless, the comment looks bogus.



Re: [PATCH v2] printf: Remove unused 'bprintf'

2024-11-30 Thread Dr. David Alan Gilbert
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Thu, 3 Oct 2024 10:13:34 +0200
> Petr Mladek  wrote:
> 
> > On Wed 2024-10-02 18:31:47, li...@treblig.org wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > bprintf() is unused. Remove it. It was added in the commit 4370aa4aa753
> > > ("vsprintf: add binary printf") but as far as I can see was never used,
> > > unlike the other two functions in that patch.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > Reviewed-by: Andy Shevchenko   
> > 
> > Looks good to me:
> > 
> > Acked-by: Petr Mladek 
> > 
> > I assume that Sven is going to take it via the ftrace tree as he
> 
>Steven ;-)
> 
> > suggested at 
> > https://lore.kernel.org/r/20241002104807.42b4b...@gandalf.local.home
> 
> Yeah, since I'm basically the only user of it, it's best it goes through my
> testing.

Hmm, did you pick this one up?

Dave

> Thanks,
> 
> -- Steve
> 
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case

2024-11-30 Thread Linus Torvalds
On Sat, 30 Nov 2024 at 04:30, Christian Brauner  wrote:
>
> What does the smp_load_acquire() pair with?

I'm not sure we have them everywhere, but at least this one at dentry
creation time.

__d_alloc():
/* Make sure we always see the terminating NUL character */
smp_store_release(&dentry->d_name.name, dname); /* ^^^ */

so even at rename time, when we swap the d_name.name pointers
(*without* using a store-release at that time), both of the dentry
names had memory orderings before.

That said, looking at swap_name() at the non-"swap just the pointers"
case, there we do just "memcpy()" the name, and it would probably be
good to update the target d_name.name with a smp_store_release.

In practice, none of this ever matters. Anybody who uses the dentry
name without locking either doesn't care enough (like comm[]) or will
use the sequence number thing to serialize at a much higher level. So
the smp_load_acquire() could probably be a READ_ONCE(), and nobody
would ever see the difference.

Linus



Re: [PATCH] exec: Make sure task->comm is always NUL-terminated

2024-11-30 Thread Kees Cook
On Fri, Nov 29, 2024 at 11:15:44PM -0800, Linus Torvalds wrote:
> Edited down to just the end result:
> 
> On Fri, 29 Nov 2024 at 20:49, Kees Cook  wrote:
> >
> >  void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
> >  {
> > size_t len = min(strlen(buf), sizeof(tsk->comm) - 1);
> >
> > trace_task_rename(tsk, buf);
> > memcpy(tsk->comm, buf, len);
> > memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len);
> > perf_event_comm(tsk, exec);
> >  }
> 
> I actually don't think that's super-safe either. Yeah, it works in
> practice, and the last byte is certainly always going to be 0, but it
> might not be reliably padded.

Right, my concern over comm is strictly about unterminated reads (i.e.
exposing memory contents stored after "comm" in the task_struct). I've not
been worried about "uninitialized content" exposure because the starting
contents have always been wiped and will (now) always end with a NUL,
so the worst exposure is seeing prior or racing bytes of whatever is
being written into comm concurrently.

> Why? It walks over the source twice. First at strlen() time, then at
> memcpy. So if the source isn't stable, the end result might have odd
> results with NUL characters in the middle.

Yeah, this just means it has greater potential to be garbled.

> And strscpy() really was *supposed* to be safe even in this case, and
> I thought it was until I looked closer.
> 
> But I think strscpy() can be saved.

Yeah, fixing the final NUL byte write is needed.

> Something (UNTESTED!) like the attached I think does the right thing.
> I added a couple of "READ_ONCE()" things to make it really super-clear
> that strscpy() reads the source exactly once, and to not allow any
> compiler re-materialization of the reads (although I think that when I
> asked people, it turns out neither gcc nor clang rematerialize memory
> accesses, so that READ_ONCE is likely more a documentation ad
> theoretical thing than a real thing).

This is fine, but it doesn't solve either an unstable source nor
concurrent writers to dest. If source changes out from under strscpy,
we can still copy a "torn" write. If destination changes out from under
strscpy, we just get a potentially interleaved output (but with the
NUL-write change, we never have a dest that _lacks_ a NUL terminator).

So yeah, let's change the loop as you have it. I'm fine with the
READ_ONCE() additions, but I'm not clear on what benefit it has.

> Hmm? I don't think your version is wrong, but I also think we'd be
> better off making our 'strscpy()' infrastructure explicitly safe wrt
> unstable source strings.

Agreed. I'll get this tested against our string handling selftests...

-- 
Kees Cook