Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
On Fri, Nov 29, 2024 at 08:54:38PM -0800, Kees Cook wrote: > Zbigniew mentioned at Linux Plumber's that systemd is interested in > switching to execveat() for service execution, but can't, because the > contents of /proc/pid/comm are the file descriptor which was used, > instead of the path to the binary. This makes the output of tools like > top and ps useless, especially in a world where most fds are opened > CLOEXEC so the number is truly meaningless. > > When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the > dentry's filename for "comm" instead of using the useless numeral from > the synthetic fdpath construction. This way the actual exec machinery > is unchanged, but cosmetically the comm looks reasonable to admins > investigating things. > > Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused > flag bits to indicate that we need to set "comm" from the dentry. > > Suggested-by: Zbigniew Jędrzejewski-Szmek > Suggested-by: Tycho Andersen > Suggested-by: Al Viro > Suggested-by: Linus Torvalds > CC: Aleksa Sarai > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec > Signed-off-by: Kees Cook > --- > Cc: Al Viro > Cc: Linus Torvalds > Cc: Eric Biederman > Cc: Alexander Viro > Cc: Christian Brauner > Cc: Jan Kara > Cc: linux...@kvack.org > Cc: linux-fsde...@vger.kernel.org > > Here's what I've put together from the various suggestions. I didn't > want to needlessly grow bprm, so I just added a flag instead. Otherwise, > this is very similar to what Linus and Al suggested. > --- > fs/exec.c | 22 +++--- > include/linux/binfmts.h | 4 +++- > 2 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 5f16500ac325..d897d60ca5c2 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm) > set_dumpable(current->mm, SUID_DUMP_USER); > > perf_event_exec(); > - __set_task_comm(me, kbasename(bprm->filename), true); > + > + /* > + * If the original filename was empty, alloc_bprm() made up a path > + * that will probably not be useful to admins running ps or similar. > + * Let's fix it up to be something reasonable. > + */ > + if (bprm->comm_from_dentry) { > + rcu_read_lock(); > + /* The dentry name won't change while we hold the rcu read > lock. */ > + __set_task_comm(me, > smp_load_acquire(&bprm->file->f_path.dentry->d_name.name), What does the smp_load_acquire() pair with?
Re: [PATCH v2] printf: Remove unused 'bprintf'
On Sun, 1 Dec 2024 02:47:49 + "Dr. David Alan Gilbert" wrote: > > > suggested at > > > https://lore.kernel.org/r/20241002104807.42b4b...@gandalf.local.home > > > > Yeah, since I'm basically the only user of it, it's best it goes through my > > testing. > > Hmm, did you pick this one up? Sorry, it wasn't in my patchwork and was missed. I can add it and Linus may even take it after -rc1. Or I'll have it in iinux-next for the next merge window. -- Steve
Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
On Sat, 30 Nov 2024 at 12:28, Mateusz Guzik wrote: > > > + /* The dentry name won't change while we hold the rcu read > > lock. */ > > + __set_task_comm(me, > > smp_load_acquire(&bprm->file->f_path.dentry->d_name.name), > > + true); > > This does not sound legit whatsoever as it would indicate all renames > wait for rcu grace periods to end, which would be prettye weird. Yes, the "won't change" should be "won't go away from under us". Linus
Re: [PATCH] exec: Make sure task->comm is always NUL-terminated
On Sat, 30 Nov 2024 at 13:05, Kees Cook wrote: > > Yeah, this just means it has greater potential to be garbled. Garbled is fine. Id' just rather it be "consistently padded". > This is fine, but it doesn't solve either an unstable source nor > concurrent writers to dest. Yeah, I guess concurrent writers will also cause possibly inconsistent padding. Maybe we just don't care. As long as it's NUL-terminated, it's a string. If somebody is messing with the kernel, they get to the garbled string parts. Linus
RE: [PATCH] exec: Make sure task->comm is always NUL-terminated
From: Kees Cook > Sent: 30 November 2024 04:49 > > Instead of adding a new use of the ambiguous strncpy(), we'd want to > use memtostr_pad() which enforces being able to check at compile time > that sizes are sensible, but this requires being able to see string > buffer lengths. Instead of trying to inline __set_task_comm() (which > needs to call trace and perf functions), just open-code it. But to > make sure we're always safe, add compile-time checking like we already > do for get_task_comm(). ... > Here's what I'd prefer to use to clean up set_task_comm(). I merged > Linus and Eric's suggestions and open-coded memtostr_pad(). > --- > fs/exec.c | 12 ++-- > include/linux/sched.h | 9 - > io_uring/io-wq.c | 2 +- > io_uring/sqpoll.c | 2 +- > kernel/kthread.c | 3 ++- > 5 files changed, 14 insertions(+), 14 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index e0435b31a811..5f16500ac325 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1200,16 +1200,16 @@ char *__get_task_comm(char *buf, size_t buf_size, > struct task_struct *tsk) > EXPORT_SYMBOL_GPL(__get_task_comm); > > /* > - * These functions flushes out all traces of the currently running executable > - * so that a new one can be started > + * This is unlocked -- the string will always be NUL-terminated, but > + * may show overlapping contents if racing concurrent reads. > */ > - > void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) > { > - task_lock(tsk); > + size_t len = min(strlen(buf), sizeof(tsk->comm) - 1); > + > trace_task_rename(tsk, buf); > - strscpy_pad(tsk->comm, buf, sizeof(tsk->comm)); > - task_unlock(tsk); > + memcpy(tsk->comm, buf, len); > + memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len); > perf_event_comm(tsk, exec); Why not do strscpy_pad() into a local char[16] and then do a 16 byte memcpy() into the target buffer? Then non-constant input data will always give a valid '\0' terminated string regardless of how strscpy_pad() is implemented. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
On Fri, Nov 29, 2024 at 08:54:38PM -0800, Kees Cook wrote: > Zbigniew mentioned at Linux Plumber's that systemd is interested in > switching to execveat() for service execution, but can't, because the > contents of /proc/pid/comm are the file descriptor which was used, > instead of the path to the binary. This makes the output of tools like > top and ps useless, especially in a world where most fds are opened > CLOEXEC so the number is truly meaningless. > > When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the > dentry's filename for "comm" instead of using the useless numeral from > the synthetic fdpath construction. This way the actual exec machinery > is unchanged, but cosmetically the comm looks reasonable to admins > investigating things. > > Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused > flag bits to indicate that we need to set "comm" from the dentry. > > Suggested-by: Zbigniew Jędrzejewski-Szmek > Suggested-by: Tycho Andersen > Suggested-by: Al Viro > Suggested-by: Linus Torvalds > CC: Aleksa Sarai > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec > Signed-off-by: Kees Cook > --- > Cc: Al Viro > Cc: Linus Torvalds > Cc: Eric Biederman > Cc: Alexander Viro > Cc: Christian Brauner > Cc: Jan Kara > Cc: linux...@kvack.org > Cc: linux-fsde...@vger.kernel.org > > Here's what I've put together from the various suggestions. I didn't > want to needlessly grow bprm, so I just added a flag instead. Otherwise, > this is very similar to what Linus and Al suggested. > --- > fs/exec.c | 22 +++--- > include/linux/binfmts.h | 4 +++- > 2 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 5f16500ac325..d897d60ca5c2 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm) > set_dumpable(current->mm, SUID_DUMP_USER); > > perf_event_exec(); > - __set_task_comm(me, kbasename(bprm->filename), true); > + > + /* > + * If the original filename was empty, alloc_bprm() made up a path > + * that will probably not be useful to admins running ps or similar. > + * Let's fix it up to be something reasonable. > + */ > + if (bprm->comm_from_dentry) { > + rcu_read_lock(); > + /* The dentry name won't change while we hold the rcu read > lock. */ > + __set_task_comm(me, > smp_load_acquire(&bprm->file->f_path.dentry->d_name.name), > + true); This does not sound legit whatsoever as it would indicate all renames wait for rcu grace periods to end, which would be prettye weird. Even commentary above dentry_cmp states: * Be careful about RCU walk racing with rename: * use 'READ_ONCE' to fetch the name pointer. * * NOTE! Even if a rename will mean that the length * was not loaded atomically, we don't care. It may be this is considered tolerable, but there should be no difficulty getting a real name there? Regardless, the comment looks bogus.
Re: [PATCH v2] printf: Remove unused 'bprintf'
* Steven Rostedt (rost...@goodmis.org) wrote: > On Thu, 3 Oct 2024 10:13:34 +0200 > Petr Mladek wrote: > > > On Wed 2024-10-02 18:31:47, li...@treblig.org wrote: > > > From: "Dr. David Alan Gilbert" > > > > > > bprintf() is unused. Remove it. It was added in the commit 4370aa4aa753 > > > ("vsprintf: add binary printf") but as far as I can see was never used, > > > unlike the other two functions in that patch. > > > > > > Signed-off-by: Dr. David Alan Gilbert > > > Reviewed-by: Andy Shevchenko > > > > Looks good to me: > > > > Acked-by: Petr Mladek > > > > I assume that Sven is going to take it via the ftrace tree as he > >Steven ;-) > > > suggested at > > https://lore.kernel.org/r/20241002104807.42b4b...@gandalf.local.home > > Yeah, since I'm basically the only user of it, it's best it goes through my > testing. Hmm, did you pick this one up? Dave > Thanks, > > -- Steve > > -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
On Sat, 30 Nov 2024 at 04:30, Christian Brauner wrote: > > What does the smp_load_acquire() pair with? I'm not sure we have them everywhere, but at least this one at dentry creation time. __d_alloc(): /* Make sure we always see the terminating NUL character */ smp_store_release(&dentry->d_name.name, dname); /* ^^^ */ so even at rename time, when we swap the d_name.name pointers (*without* using a store-release at that time), both of the dentry names had memory orderings before. That said, looking at swap_name() at the non-"swap just the pointers" case, there we do just "memcpy()" the name, and it would probably be good to update the target d_name.name with a smp_store_release. In practice, none of this ever matters. Anybody who uses the dentry name without locking either doesn't care enough (like comm[]) or will use the sequence number thing to serialize at a much higher level. So the smp_load_acquire() could probably be a READ_ONCE(), and nobody would ever see the difference. Linus
Re: [PATCH] exec: Make sure task->comm is always NUL-terminated
On Fri, Nov 29, 2024 at 11:15:44PM -0800, Linus Torvalds wrote: > Edited down to just the end result: > > On Fri, 29 Nov 2024 at 20:49, Kees Cook wrote: > > > > void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) > > { > > size_t len = min(strlen(buf), sizeof(tsk->comm) - 1); > > > > trace_task_rename(tsk, buf); > > memcpy(tsk->comm, buf, len); > > memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len); > > perf_event_comm(tsk, exec); > > } > > I actually don't think that's super-safe either. Yeah, it works in > practice, and the last byte is certainly always going to be 0, but it > might not be reliably padded. Right, my concern over comm is strictly about unterminated reads (i.e. exposing memory contents stored after "comm" in the task_struct). I've not been worried about "uninitialized content" exposure because the starting contents have always been wiped and will (now) always end with a NUL, so the worst exposure is seeing prior or racing bytes of whatever is being written into comm concurrently. > Why? It walks over the source twice. First at strlen() time, then at > memcpy. So if the source isn't stable, the end result might have odd > results with NUL characters in the middle. Yeah, this just means it has greater potential to be garbled. > And strscpy() really was *supposed* to be safe even in this case, and > I thought it was until I looked closer. > > But I think strscpy() can be saved. Yeah, fixing the final NUL byte write is needed. > Something (UNTESTED!) like the attached I think does the right thing. > I added a couple of "READ_ONCE()" things to make it really super-clear > that strscpy() reads the source exactly once, and to not allow any > compiler re-materialization of the reads (although I think that when I > asked people, it turns out neither gcc nor clang rematerialize memory > accesses, so that READ_ONCE is likely more a documentation ad > theoretical thing than a real thing). This is fine, but it doesn't solve either an unstable source nor concurrent writers to dest. If source changes out from under strscpy, we can still copy a "torn" write. If destination changes out from under strscpy, we just get a potentially interleaved output (but with the NUL-write change, we never have a dest that _lacks_ a NUL terminator). So yeah, let's change the loop as you have it. I'm fine with the READ_ONCE() additions, but I'm not clear on what benefit it has. > Hmm? I don't think your version is wrong, but I also think we'd be > better off making our 'strscpy()' infrastructure explicitly safe wrt > unstable source strings. Agreed. I'll get this tested against our string handling selftests... -- Kees Cook