Re: [PATCH v2] prctl: add PR_[GS]ET_KILLABLE
On Wed, 2018-08-01 at 16:19 +0200, Oleg Nesterov wrote: > On 07/31, Jürg Billeter wrote: > > > > > Could you explain your use-case? Why a shell wants to use > > > CLONE_NEWPID? > > > > To guarantee that there won't be any runaway processes, i.e., ensure > > that no descendants (background helper daemons or misbehaving > > processes) survive when the child process is terminated. > > We already have PR_SET_CHILD_SUBREAPER. > > Perhaps we can finally add PR_KILL_MY_DESCENDANTS_ON_EXIT? This was already > discussed some time ago, but I can't find the previous discussion... Simple > to implement. This would definitely be an option. You mentioned it last October in the PR_SET_PDEATHSIG_PROC discussion¹. However, as PID namespaces already exist and appear to be a good fit for the most part, I think it makes sense to just add the missing pieces to PID namespaces instead of duplicating part of the PID namespace functionality. Also, based on Eric's comment in that other discussion about no_new_privs not being allowed to increase the attack surface, PR_KILL_MY_DESCENDANTS_ON_EXIT might require CAP_SYS_ADMIN as well (due to setuid children). In which case the only potential benefit would be that it still allows the child to kill arbitrary processes, as far as I can tell. > > And to prevent children from killing their ancestors. > > OK, this is the only reason for CLONE_NEWPID which I can understand so far. > Not that I understand why this is that useful ;) The overall goal is increasing isolation between (some) child processes and the rest of the system. Isolation from runaway processes and isolation from signals are independent aspects and it could be useful to control them independently. However, I also expect it to be common that both are wanted at the same time. Jürg ¹ https://lkml.org/lkml/2017/10/5/546
Re: [PATCH v2] prctl: add PR_[GS]ET_KILLABLE
On Fri, 2018-08-03 at 08:34 -0500, Eric W. Biederman wrote: > From the other direction I think we can just go ahead and fix handling > of the job control stop signals as well. As far as I understand it > there is a legitimate complaint that SIGTSTP SIGTTIN SIGTTOU do not work > on a pid namespace leader. > > The current implementation actual overshoots. We only need to ignore > signals from the descendants in the pid namespace. Ideally signals from > other processes are treated like normal. We have only been able to > apply that ideal to SIGSTOP and SIGKILL as we can handle them in > prepare_signal. Other signals can be blocked which means the logic to > handle them needs to live in get_signal where we may have no sender > information. SIGINT and SIGQUIT are also relevant for job control. Would the same approach be possible for them? And I would like to allow regular POSIX signal behavior also for signals used outside job control, e.g., SIGTERM, for maximum compatibility with existing applications. Furthermore, it would also be good to allow a PID namespace leader to send a signal to itself. Do you think we can and should cover all of the above without a prctl by loosening the restrictions imposed by SIGNAL_UNKILLABLE (with reasonable effort)? In my opinion, my patch still makes sense as it simply allows regular POSIX signal behavior for PID namespace leaders and it doesn't risk any compatibility issues as the behavior doesn't change at all for processes that don't invoke the new prctl. I.e., simple patch, low risk, and covers all signals. In the meantime I've tested the missing patch for copy_process() and will send out v3 of the patch in case the new prctl makes sense after all. Jürg
[PATCH v3 2/2] prctl: add PR_[GS]ET_KILLABLE
PR_SET_KILLABLE clears the SIGNAL_UNKILLABLE flag. This allows CLONE_NEWPID tasks to restore normal signal behavior, opting out of the special signal protection for init processes. This prctl does not allow setting the SIGNAL_UNKILLABLE flag, only clearing. The SIGNAL_UNKILLABLE flag, which is implicitly set for tasks cloned with CLONE_NEWPID, has the effect of ignoring all signals (from userspace) if the corresponding handler is set to SIG_DFL. The only exceptions are SIGKILL and SIGSTOP and they are only accepted if raised from an ancestor namespace. SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z. While a task with the SIGNAL_UNKILLABLE flag could install handlers for these signals, this is not sufficient to implement a shell that uses CLONE_NEWPID for child processes: * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process itself, it's not possible to implement the stop action in a custom SIGTSTP handler. * Many applications do not install handlers for these signals and thus, job control won't work properly with unmodified applications. There are other scenarios besides job control in a shell where applications rely on the default actions as described in signal(7) and PID isolation may be useful. This new prctl makes the signal protection for "init" processes optional, without breaking backward compatibility. Signed-off-by: Jürg Billeter --- include/uapi/linux/prctl.h | 4 kernel/sys.c | 13 + 2 files changed, 17 insertions(+) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c0d7ea0bf5b6..92afb63da727 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -219,4 +219,8 @@ struct prctl_mm_map { # define PR_SPEC_DISABLE (1UL << 2) # define PR_SPEC_FORCE_DISABLE (1UL << 3) +/* Control SIGNAL_UNKILLABLE */ +#define PR_GET_KILLABLE54 +#define PR_SET_KILLABLE55 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 38509dc1f77b..92c9322cfb98 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2484,6 +2484,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_GET_KILLABLE: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(!(me->signal->flags & SIGNAL_UNKILLABLE), +(int __user *)arg2); + break; + case PR_SET_KILLABLE: + if (arg2 != 1 || arg3 || arg4 || arg5) + return -EINVAL; + spin_lock_irq(&me->sighand->siglock); + me->signal->flags &= ~SIGNAL_UNKILLABLE; + spin_unlock_irq(&me->sighand->siglock); + break; default: error = -EINVAL; break; -- 2.18.0
[PATCH v3 1/2] fork: do not rely on SIGNAL_UNKILLABLE for init check
copy_process() currently checks the SIGNAL_UNKILLABLE flag to determine whether to accept CLONE_PARENT. In preparation for allowing init processes to opt out of SIGNAL_UNKILLABLE, directly check whether the process is an init process with is_child_reaper(). Signed-off-by: Jürg Billeter --- kernel/fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index 1b27babc4c78..c019ce461556 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1646,7 +1646,7 @@ static __latent_entropy struct task_struct *copy_process( * from creating siblings. */ if ((clone_flags & CLONE_PARENT) && - current->signal->flags & SIGNAL_UNKILLABLE) + is_child_reaper(task_tgid(current))) return ERR_PTR(-EINVAL); /* -- 2.18.0
[PATCH] prctl: add PR_[GS]ET_KILLABLE
PR_SET_KILLABLE clears the SIGNAL_UNKILLABLE flag. This allows CLONE_NEWPID tasks to restore normal signal behavior, opting out of the special signal protection for init processes. This is required for job control in a shell that uses CLONE_NEWPID for child processes. This prctl does not allow setting the SIGNAL_UNKILLABLE flag, only clearing. Signed-off-by: Jürg Billeter --- include/uapi/linux/prctl.h | 4 kernel/sys.c | 11 +++ 2 files changed, 15 insertions(+) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c0d7ea0bf5b6..92afb63da727 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -219,4 +219,8 @@ struct prctl_mm_map { # define PR_SPEC_DISABLE (1UL << 2) # define PR_SPEC_FORCE_DISABLE (1UL << 3) +/* Control SIGNAL_UNKILLABLE */ +#define PR_GET_KILLABLE54 +#define PR_SET_KILLABLE55 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 38509dc1f77b..264de630d548 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2484,6 +2484,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_GET_KILLABLE: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(!(me->signal->flags & SIGNAL_UNKILLABLE), +(int __user *)arg2); + break; + case PR_SET_KILLABLE: + if (arg2 != 1 || arg3 || arg4 || arg5) + return -EINVAL; + me->signal->flags &= ~SIGNAL_UNKILLABLE; + break; default: error = -EINVAL; break; -- 2.18.0
Re: [PATCH] prctl: add PR_[GS]ET_KILLABLE
On Mon, 2018-07-30 at 12:17 +0200, Oleg Nesterov wrote: > On 07/30, Jürg Billeter wrote: > > > > This is required for job control in a shell that uses CLONE_NEWPID for > > child processes. > > Could you explain in more details? The SIGNAL_UNKILLABLE flag, which is implicitly set for tasks cloned with CLONE_NEWPID, has the effect of ignoring all signals (from userspace) if the corresponding handler is set to SIG_DFL. The only exceptions are SIGKILL and SIGSTOP and they are only accepted if raised from an ancestor namespace. SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z. While a task with the SIGNAL_UNKILLABLE flag could install handlers for these signals, this is not sufficient to implement a shell that uses CLONE_NEWPID for child processes: * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process itself, I don't think it's possible to implement the stop action in a custom SIGTSTP handler. * Many applications do not install handlers for these signals and thus, job control won't work properly with unmodified applications. Job control in a shell is just an example. There are other scenarios, of course, where applications rely on the default actions as described in signal(7), and PID isolation may be useful. In my opinion, the kernel support for preventing accidental killing of the "init" process should really be optional and this new prctl provides this without breaking backward compatibility. > > + case PR_SET_KILLABLE: > > + if (arg2 != 1 || arg3 || arg4 || arg5) > > + return -EINVAL; > > + me->signal->flags &= ~SIGNAL_UNKILLABLE; > > this needs spin_lock_irq(me->sighand->siglock). Thanks for the review, will fix this for v2. Jürg
[PATCH v2] prctl: add PR_[GS]ET_KILLABLE
PR_SET_KILLABLE clears the SIGNAL_UNKILLABLE flag. This allows CLONE_NEWPID tasks to restore normal signal behavior, opting out of the special signal protection for init processes. This prctl does not allow setting the SIGNAL_UNKILLABLE flag, only clearing. The SIGNAL_UNKILLABLE flag, which is implicitly set for tasks cloned with CLONE_NEWPID, has the effect of ignoring all signals (from userspace) if the corresponding handler is set to SIG_DFL. The only exceptions are SIGKILL and SIGSTOP and they are only accepted if raised from an ancestor namespace. SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z. While a task with the SIGNAL_UNKILLABLE flag could install handlers for these signals, this is not sufficient to implement a shell that uses CLONE_NEWPID for child processes: * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process itself, it's not possible to implement the stop action in a custom SIGTSTP handler. * Many applications do not install handlers for these signals and thus, job control won't work properly with unmodified applications. There are other scenarios besides job control in a shell where applications rely on the default actions as described in signal(7) and PID isolation may be useful. This new prctl makes the signal protection for "init" processes optional, without breaking backward compatibility. Signed-off-by: Jürg Billeter --- v2: Hold siglock for PR_SET_KILLABLE, expand commit message. include/uapi/linux/prctl.h | 4 kernel/sys.c | 13 + 2 files changed, 17 insertions(+) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c0d7ea0bf5b6..92afb63da727 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -219,4 +219,8 @@ struct prctl_mm_map { # define PR_SPEC_DISABLE (1UL << 2) # define PR_SPEC_FORCE_DISABLE (1UL << 3) +/* Control SIGNAL_UNKILLABLE */ +#define PR_GET_KILLABLE54 +#define PR_SET_KILLABLE55 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 38509dc1f77b..92c9322cfb98 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2484,6 +2484,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_GET_KILLABLE: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(!(me->signal->flags & SIGNAL_UNKILLABLE), +(int __user *)arg2); + break; + case PR_SET_KILLABLE: + if (arg2 != 1 || arg3 || arg4 || arg5) + return -EINVAL; + spin_lock_irq(&me->sighand->siglock); + me->signal->flags &= ~SIGNAL_UNKILLABLE; + spin_unlock_irq(&me->sighand->siglock); + break; default: error = -EINVAL; break; -- 2.18.0
Re: [PATCH v2] prctl: add PR_[GS]ET_KILLABLE
On Tue, 2018-07-31 at 16:39 +0200, Oleg Nesterov wrote: > On 07/31, Jürg Billeter wrote: > > SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z. > > While a task with the SIGNAL_UNKILLABLE flag could install handlers for > > these signals, this is not sufficient to implement a shell that uses > > CLONE_NEWPID for child processes: > > Ah. My question wasn't clear, sorry. > > Could you explain your use-case? Why a shell wants to use > CLONE_NEWPID? To guarantee that there won't be any runaway processes, i.e., ensure that no descendants (background helper daemons or misbehaving processes) survive when the child process is terminated. And to prevent children from killing their ancestors. This is not something that can be always-on in all shells, but it could be an option for users that want this control/isolation. > And what do we actually want in, say, ^Z case? Just stop the child reaper > or may be it would be better to stop the whole pid namespace? Stopping the whole PID namespace would be interesting, however, I think this should be discussed separately if and when there is a proposal to support this. For now the process group is stopped, same as without PID namespaces. > > * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process > >itself, it's not possible to implement the stop action in a custom > >SIGTSTP handler. > > Yes. So may be we actually want to change __isig() paths to use > SEND_SIG_FORCED (this is not that simple), or perhaps we can change > __send_signal() to not drop SIGSTOP sent to itself, or may be we can even > introduce SIG_DFL_EVEN_IF_INIT, I dunno. In my opinion, my patch is much simpler and also more general as it covers all scenarios where regular signal handling is required or desired for "init" processes, with minimal code changes (after PR_SET_KILLABLE, binaries that expect SIG_DFL to work can be executed without changes). > > * Many applications do not install handlers for these signals and > >thus, job control won't work properly with unmodified applications. > > I can't understand this. An application should be changed anyway to do > PR_SET_KILLABLE? PR_SET_KILLABLE can be called (e.g., by the shell) between clone() and execve(). (Some applications may have issues running as subreaper but that's a separate matter, signal handling will work as expected). > > + case PR_SET_KILLABLE: > > + if (arg2 != 1 || arg3 || arg4 || arg5) > > + return -EINVAL; > > + spin_lock_irq(&me->sighand->siglock); > > + me->signal->flags &= ~SIGNAL_UNKILLABLE; > > + spin_unlock_irq(&me->sighand->siglock); > > OK, but then you need to change the CLONE_PARENT/SIGNAL_UNKILLABLE check > in copy_process(). Good point, need a different check for the PID namespace root process in copy_process(). Thanks, Jürg
Re: [PATCH] prctl: add PR_[GS]ET_KILLABLE
On Tue, 2018-07-31 at 18:26 +0200, Jann Horn wrote: > On Mon, Jul 30, 2018 at 10:01 AM Jürg Billeter wrote: > > [...] > > diff --git a/kernel/sys.c b/kernel/sys.c > > index 38509dc1f77b..264de630d548 100644 > > --- a/kernel/sys.c > > +++ b/kernel/sys.c > > [...] > > + case PR_SET_KILLABLE: > > + if (arg2 != 1 || arg3 || arg4 || arg5) > > + return -EINVAL; > > + me->signal->flags &= ~SIGNAL_UNKILLABLE; > > + break; > > I don't have an opinion on this patchset otherwise, but should this > prctl maybe block PR_SET_KILLABLE if you're actually the real init > process? This seems like it could potentially lead to weird things. While I don't expect global init to use this, I can't think of a good reason to disallow it in the kernel. Do you have specific concerns or is the code in kernel/fork.c the only reason? I prefer avoiding special cases unless really required. > This code in kernel/fork.c seems to rely on the fact that global init > is SIGNAL_UNKILLABLE, and probably also leads to weirdness if > container init is non-SIGNAL_UNKILLABLE: Yes, Oleg has mentioned this as well. I have to change copy_process() to directly check for the PID namespace root process instead of checking for SIGNAL_UNKILLABLE. Jürg
[PATCH v2] readv.2, io_submit.2: Document RWF_APPEND added in Linux 4.16
Signed-off-by: Jürg Billeter --- Changes since version 1: - Explain offset handling man2/io_submit.2 | 13 + man2/readv.2 | 17 + 2 files changed, 30 insertions(+) diff --git a/man2/io_submit.2 b/man2/io_submit.2 index 397fd0b75..25961138a 100644 --- a/man2/io_submit.2 +++ b/man2/io_submit.2 @@ -111,6 +111,19 @@ field of the .I io_event structure (see .BR io_getevents (2)). +.TP +.BR RWF_APPEND " (since Linux 4.16)" +.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb +Append data to the end of the file. +See the description of the flag of the same name in +.BR pwritev2 (2) +as well as the description of +.B O_APPEND +in +.BR open (2). +The +.I aio_offset +field is ignored. The file offset is not changed. .RE .TP .I aio_lio_opcode diff --git a/man2/readv.2 b/man2/readv.2 index b3b7b9658..9ef250e11 100644 --- a/man2/readv.2 +++ b/man2/readv.2 @@ -248,6 +248,23 @@ to .BR EAGAIN . Currently, this flag is meaningful only for .BR preadv2 (). +.TP +.BR RWF_APPEND " (since Linux 4.16)" +.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb +Provide a per-write equivalent of the +.B O_APPEND +.BR open (2) +flag. +This flag is meaningful only for +.BR pwritev2 (), +and its effect applies only to the data range written by the system call. +The +.I offset +argument does not affect the write operation, the data is always appended +to the end of the file. However, if the +.I offset +argument is \-1, the current file offset is updated. +This matches the behavior when the file is opened in append mode. .SH RETURN VALUE On success, .BR readv (), -- 2.16.3
Re: [PATCH v4 2/4] namei: O_BENEATH-style path resolution flags
Hi Aleksa, On Tue, 2018-11-13 at 01:26 +1100, Aleksa Sarai wrote: > * O_BENEATH: Disallow "escapes" from the starting point of the > filesystem tree during resolution (you must stay "beneath" the > starting point at all times). Currently this is done by disallowing > ".." and absolute paths (either in the given path or found during > symlink resolution) entirely, as well as all "magic link" jumping. With open_tree(2) and OPEN_TREE_CLONE, will O_BENEATH still be necessary? As I understand it, O_BENEATH could be replaced by a much simpler flag that only disallows absolute paths (incl. absolute symlinks). And it would have the benefit that you can actually pass the tree/directory fd to another process and escaping would not be possible even if that other process doesn't use O_BENEATH (after calling mount_setattr(2) to make sure it's locked down). This approach would also make it easy to restrict writes via a cloned tree/directory fd by marking it read-only via mount_setattr(2) (and locking down the read-only flag). This would again be especially useful when passing tree/directory fds across processes, or for voluntary self-lockdown within a process for robustness against security bugs. This wouldn't affect any of the other flags in this patch. And for full equivalence to O_BENEATH you'd have to use O_NOMAGICLINKS in addition to O_NOABSOLUTE, or whatever that new flag would be called. Or is OPEN_TREE_CLONE too expensive for this use case? Or is there anything else I'm missing? Jürg
[PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT
This introduces a new thread group flag that can be set by calling prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0) When a thread group exits with this flag set, it will send SIGKILL to all descendant processes. This can be used to prevent stray child processes. This flag is cleared on privilege gaining execve(2) to ensure an unprivileged process cannot get a privileged process to send SIGKILL. Descendants that are orphaned and reparented to an ancestor of the current process before the current process exits, will not be killed. PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes. If a descendant gained privileges, the current process may not be allowed to kill it, and the descendant process will survive. PR_SET_NO_NEW_PRIVS can be used to prevent descendant processes from gaining privileges. Suggested-by: Oleg Nesterov Signed-off-by: Jürg Billeter --- fs/exec.c| 6 ++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/exit.c| 12 kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 3 +++ 7 files changed, 40 insertions(+) diff --git a/fs/exec.c b/fs/exec.c index 1ebf6e5a521d..f48ff493 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1339,6 +1339,12 @@ void setup_new_exec(struct linux_binprm * bprm) /* Make sure parent cannot signal privileged process. */ current->pdeath_signal = 0; + /* +* Do not send SIGKILL from privileged process as it may +* have been requested by an unprivileged process. +*/ + current->signal->kill_descendants_on_exit = 0; + /* * For secureexec, reset the stack limit to sane default to * avoid bad behavior from the prior rlimits. This has to diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 1be35729c2c5..3bfb71701488 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -124,6 +124,9 @@ struct signal_struct { unsigned intis_child_subreaper:1; unsigned inthas_child_subreaper:1; + /* Send SIGKILL to descendant processes on exit */ + unsigned intkill_descendants_on_exit:1; + #ifdef CONFIG_POSIX_TIMERS /* POSIX.1b Interval Timers */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c0d7ea0bf5b6..2ac4da1f282b 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -198,6 +198,10 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* Send SIGKILL to descendant processes on exit */ +#define PR_SET_KILL_DESCENDANTS_ON_EXIT48 +#define PR_GET_KILL_DESCENDANTS_ON_EXIT49 + /* arm64 Scalable Vector Extension controls */ /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */ #define PR_SVE_SET_VL 50 /* set task vector length */ diff --git a/kernel/exit.c b/kernel/exit.c index 0e21e6d21f35..7fe0c694685a 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -694,6 +694,15 @@ static void forget_original_parent(struct task_struct *father, list_splice_tail_init(&father->children, &reaper->children); } +static int kill_descendant_visitor(struct task_struct *p, void *data) +{ + /* This may fail, e.g., when a descendant process gained privileges. */ + group_send_sig_info(SIGKILL, SEND_SIG_NOINFO, p, PIDTYPE_TGID); + + /* Always continue walking the process tree. */ + return 1; +} + /* * Send signals to all our closest relatives so that they know * to properly mourn us.. @@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead) struct task_struct *p, *n; LIST_HEAD(dead); + if (group_dead && tsk->signal->kill_descendants_on_exit) + walk_process_tree(tsk, kill_descendant_visitor, NULL); + write_lock_irq(&tasklist_lock); forget_original_parent(tsk, &dead); diff --git a/kernel/sys.c b/kernel/sys.c index 123bd73046ec..8d9af81da093 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2476,6 +2476,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_SET_KILL_DESCENDANTS_ON_EXIT: + if (arg3 || arg4 || arg5) + return -EINVAL; + me->signal->kill_descendants_on_exit = !!arg2; + break; + case PR_GET_KILL_DESCENDANTS_ON_EXIT: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(me->signal->kill_descendants_on_
[PATCH 0/1] Add prctl to kill descendants on exit
This patch adds a new prctl to kill all descendant processes on exit. See commit message for details of the prctl. This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1]. In the following discussion, Oleg suggested this approach. The motivation for this is to provide a lightweight mechanism to prevent stray processes. There is also a related Bugzilla entry [2]. PID namespaces can also be used to prevent stray processes, of course. However, they are not quite as lightweight as they typically also require a new mount namespace to be able to mount a new /proc. And they require CAP_SYS_ADMIN. User namespaces can help to gain CAP_SYS_ADMIN, however, that further increases the overhead and the other effects of the user namespace may not be desired. PID 1 in PID namespaces also exhibits non-standard signal behavior (SIGNAL_UNKILLABLE) [3]. [1] https://lkml.kernel.org/lkml/20170929123058.48924-...@bitron.ch/ [2] https://bugzilla.kernel.org/show_bug.cgi?id=43300 [3] https://lkml.kernel.org/lkml/20180803144021.56920-...@bitron.ch/ Jürg Billeter (1): prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT fs/exec.c| 6 ++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/exit.c| 12 kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 3 +++ 7 files changed, 40 insertions(+) -- 2.19.2
Re: [PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT
Hi Oleg, Thanks for the review. On Thu, 2018-11-29 at 13:34 +0100, Oleg Nesterov wrote: > To me it would be more clean to call > walk_process_tree(kill_descendant_visitor) > unconditionally in find_new_reaper() right before "if (has_child_subreaper)", > but > then we will need to shift read_lock(tasklist) from walk_process_tree(). Yes, that's the reason why I added the call before the tasklist lock. Let me know if you want me to move the read lock from walk_process_tree() to PR_SET_CHILD_SUBREAPER (the only caller) instead. > So I think the patch is mostly fine, the only problem I can see is that > PR_SET_KILL_DESCENDANTS_ON_EXIT can race with PR_SET_CHILD_SUBREAPER, they > both > need to update the bits in the same word. Good point. I'll make it a regular bool instead of a bitfield for v2, unless you have another approach in mind to fix this. Jürg
[PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT
This introduces a new thread group flag that can be set by calling prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0) When a thread group exits with this flag set, it will send SIGKILL to all descendant processes. This can be used to prevent stray child processes. This flag is cleared on privilege gaining execve(2) to ensure an unprivileged process cannot get a privileged process to send SIGKILL. Descendants that are orphaned and reparented to an ancestor of the current process before the current process exits, will not be killed. PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes. If a descendant gained privileges, the current process may not be allowed to kill it, and the descendant process will survive. PR_SET_NO_NEW_PRIVS can be used to prevent descendant processes from gaining privileges. Suggested-by: Oleg Nesterov Signed-off-by: Jürg Billeter --- fs/exec.c| 6 ++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/exit.c| 12 kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 3 +++ 7 files changed, 40 insertions(+) diff --git a/fs/exec.c b/fs/exec.c index 044e296f2381..1c9520d83d6b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1343,6 +1343,12 @@ void setup_new_exec(struct linux_binprm * bprm) /* Make sure parent cannot signal privileged process. */ current->pdeath_signal = 0; + /* +* Do not send SIGKILL from privileged process as it may +* have been requested by an unprivileged process. +*/ + current->signal->kill_descendants_on_exit = false; + /* * For secureexec, reset the stack limit to sane default to * avoid bad behavior from the prior rlimits. This has to diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 0c3e396dca04..91ed7f480b60 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -124,6 +124,9 @@ struct signal_struct { unsigned intis_child_subreaper:1; unsigned inthas_child_subreaper:1; + /* Send SIGKILL to descendant processes on exit */ + boolkill_descendants_on_exit; + #ifdef CONFIG_POSIX_TIMERS /* POSIX.1b Interval Timers */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index b17201edfa09..a31141236064 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -198,6 +198,10 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* Send SIGKILL to descendant processes on exit */ +#define PR_SET_KILL_DESCENDANTS_ON_EXIT48 +#define PR_GET_KILL_DESCENDANTS_ON_EXIT49 + /* arm64 Scalable Vector Extension controls */ /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */ #define PR_SVE_SET_VL 50 /* set task vector length */ diff --git a/kernel/exit.c b/kernel/exit.c index 0e21e6d21f35..7fe0c694685a 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -694,6 +694,15 @@ static void forget_original_parent(struct task_struct *father, list_splice_tail_init(&father->children, &reaper->children); } +static int kill_descendant_visitor(struct task_struct *p, void *data) +{ + /* This may fail, e.g., when a descendant process gained privileges. */ + group_send_sig_info(SIGKILL, SEND_SIG_NOINFO, p, PIDTYPE_TGID); + + /* Always continue walking the process tree. */ + return 1; +} + /* * Send signals to all our closest relatives so that they know * to properly mourn us.. @@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead) struct task_struct *p, *n; LIST_HEAD(dead); + if (group_dead && tsk->signal->kill_descendants_on_exit) + walk_process_tree(tsk, kill_descendant_visitor, NULL); + write_lock_irq(&tasklist_lock); forget_original_parent(tsk, &dead); diff --git a/kernel/sys.c b/kernel/sys.c index 123bd73046ec..8d9af81da093 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2476,6 +2476,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_SET_KILL_DESCENDANTS_ON_EXIT: + if (arg3 || arg4 || arg5) + return -EINVAL; + me->signal->kill_descendants_on_exit = !!arg2; + break; + case PR_GET_KILL_DESCENDANTS_ON_EXIT: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(me->signal->kill_descendants_on_
[PATCH v2 0/1] Add prctl to kill descendants on exit
This patch adds a new prctl to kill all descendant processes on exit. See commit message for details of the prctl. This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1]. In the following discussion, Oleg suggested this approach. The motivation for this is to provide a lightweight mechanism to prevent stray processes. There is also a related Bugzilla entry [2]. PID namespaces can also be used to prevent stray processes, of course. However, they are not quite as lightweight as they typically also require a new mount namespace to be able to mount a new /proc. And they require CAP_SYS_ADMIN. User namespaces can help to gain CAP_SYS_ADMIN, however, that further increases the overhead and the other effects of the user namespace may not be desired. PID 1 in PID namespaces also exhibits non-standard signal behavior (SIGNAL_UNKILLABLE) [3]. Changes in v2: - Use bool instead of bitfield to avoid race with PR_SET_CHILD_SUBREAPER [1] https://lkml.kernel.org/lkml/20170929123058.48924-...@bitron.ch/ [2] https://bugzilla.kernel.org/show_bug.cgi?id=43300 [3] https://lkml.kernel.org/lkml/20180803144021.56920-...@bitron.ch/ Jürg Billeter (1): prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT fs/exec.c| 6 ++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/exit.c| 12 kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 3 +++ 7 files changed, 40 insertions(+) -- 2.19.2
Re: [PATCH v4] signal: add taskfd_send_signal() syscall
On Thu, 2018-12-06 at 13:30 +0100, Florian Weimer wrote: > * Christian Brauner: > > > /* zombies */ > > Zombies can be signaled just as any other process. No special error will be > > reported since a zombie state is an unreliable state (cf. [3]). > > I still disagree with this analysis. If I know that the target process > is still alive, and it is not, this is a persistent error condition > which can be reliably reported. Given that someone might send SIGKILL > to the process behind my back, detecting this error condition could be > useful. As I understand it, kill() behaves the same way. I think it's good that this new syscall keeps the behavior as close as possible to kill(). E.g., this would allow emulating kill() (or a higher level API equivalent) on top of taskfds without subtle differences in behavior. As the new syscall supports flags, we could consider introducing a flag that changes the behavior in the zombie case. However, I think that should be a separate discussion (after merge of the syscall) and the default behavior makes sense as is. Jürg
Re: [PATCH v4] signal: add taskfd_send_signal() syscall
On Thu, 2018-12-06 at 14:12 +0100, Florian Weimer wrote: > * Jürg Billeter: > > > On Thu, 2018-12-06 at 13:30 +0100, Florian Weimer wrote: > > > * Christian Brauner: > > > > > > > /* zombies */ > > > > Zombies can be signaled just as any other process. No special error > > > > will be > > > > reported since a zombie state is an unreliable state (cf. [3]). > > > > > > I still disagree with this analysis. If I know that the target process > > > is still alive, and it is not, this is a persistent error condition > > > which can be reliably reported. Given that someone might send SIGKILL > > > to the process behind my back, detecting this error condition could be > > > useful. > > > > As I understand it, kill() behaves the same way. I think it's good that > > this new syscall keeps the behavior as close as possible to kill(). > > No, kill does not behave in this way because the PID can be reused. > The error condition is not stable there. The PID can't be reused as long as it's a zombie. It can only be reused when it has been wait()ed for. Or am I misunderstanding something? Jürg
Re: [PATCH v2 0/1] Add prctl to kill descendants on exit
On Fri, 2018-11-30 at 08:00 +, Jürg Billeter wrote: > This patch adds a new prctl to kill all descendant processes on exit. > See commit message for details of the prctl. > > This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1]. > In the following discussion, Oleg suggested this approach. > > The motivation for this is to provide a lightweight mechanism to prevent > stray processes. There is also a related Bugzilla entry [2]. Andrew, Eric, does this look good to you as well? Jürg
Re: [PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT
On Fri, 2018-11-30 at 14:40 +0100, Florian Weimer wrote: > * Jürg Billeter: > > > This introduces a new thread group flag that can be set by calling > > > > prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0) > > > > When a thread group exits with this flag set, it will send SIGKILL to > > all descendant processes. This can be used to prevent stray child > > processes. > > > > This flag is cleared on privilege gaining execve(2) to ensure an > > unprivileged process cannot get a privileged process to send SIGKILL. > > So this is inherited across regular execve? I'm not sure that's a good > idea. Yes, this matches PR_SET_CHILD_SUBREAPER (and other process attributes). Besides consistency and allowing a parent to configure the flag for a spawned process, this is also needed to prevent a process from clearing the flag (in combination with a seccomp filter). > > > Descendants that are orphaned and reparented to an ancestor of the > > current process before the current process exits, will not be killed. > > PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes. > > For double- or triple-forking daemons, the reparenting will be racy, if > I understand things correctly. Can you please elaborate, if you're concerned about a particular race? As the commit message mentions, for containment this flag can be combined with PR_SET_CHILD_SUBREAPER (and PR_SET_NO_NEW_PRIVS). Jürg
Re: [PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT
On Sat, 2018-12-01 at 13:28 +0100, Florian Weimer wrote: > * Jürg Billeter: > > > On Fri, 2018-11-30 at 14:40 +0100, Florian Weimer wrote: > > > * Jürg Billeter: > > > > > > > This introduces a new thread group flag that can be set by calling > > > > > > > > prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0) > > > > > > > > When a thread group exits with this flag set, it will send SIGKILL to > > > > all descendant processes. This can be used to prevent stray child > > > > processes. > > > > > > > > This flag is cleared on privilege gaining execve(2) to ensure an > > > > unprivileged process cannot get a privileged process to send SIGKILL. > > > > > > So this is inherited across regular execve? I'm not sure that's a good > > > idea. > > > > Yes, this matches PR_SET_CHILD_SUBREAPER (and other process > > attributes). Besides consistency and allowing a parent to configure the > > flag for a spawned process, this is also needed to prevent a process > > from clearing the flag (in combination with a seccomp filter). > > I think the semantics of PR_SET_CHILD_SUBREAPER are different, and the > behavior makes more sense there. In my opinion, introducing inconsistency by deviating from the common behavior of retaining process attributes across execve would be more confusing/surprising to users. I don't see why it makes sense for PR_SET_CHILD_SUBREAPER but not for PR_SET_KILL_DESCENDANTS_ON_EXIT. Also, the main motivation is to provide a subset of PID namespace features to unprivileged processes with a lightweight mechanism. Retaining kill_descendants_on_exit across execve allows very similar usage to PID namespaces: E.g., the parent can set PR_SET_KILL_DESCENDANTS_ON_EXIT and PR_SET_CHILD_SUBREAPER in the child before execve and the spawned init-like executable doesn't need to know about this flag itself, i.e., the same init-like program can function as a leader of a PID namespace or as a subreaper with this extra flag set without code changes. If the flag was cleared by execve, the program would need to know about this flag and it would be impossible for the parent to lock this down using seccomp. > > > > > Descendants that are orphaned and reparented to an ancestor of the > > > > current process before the current process exits, will not be killed. > > > > PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes. > > > > > > For double- or triple-forking daemons, the reparenting will be racy, if > > > I understand things correctly. > > > > Can you please elaborate, if you're concerned about a particular race? > > As the commit message mentions, for containment this flag can be > > combined with PR_SET_CHILD_SUBREAPER (and PR_SET_NO_NEW_PRIVS). > > Without PR_SET_CHILD_SUBREAPER, if a newly execve'ed daemon performs > double/triple forking to disentangle itself from the parent process > session, and the parent process which set > PR_SET_KILL_DESCENDANTS_ON_EXIT terminates, behavior depends on when > exactly the parent process terminates. The daemon process will leak if > it has completed its reparenting. > > I think this could be sufficiently common that solution is needed here. I expect the common case to be that PR_SET_KILL_DESCENDANTS_ON_EXIT will be used together with PR_SET_CHILD_SUBREAPER (and possibly PR_SET_NO_NEW_PRIVS) to prevent stray children. And I don't see a race condition in that case. PR_SET_KILL_DESCENDANTS_ON_EXIT can be used for non-subreapers but I expect this to be used in more specialized scenarios where the program is designed/known to avoid such race conditions. We could theoretically restrict PR_SET_KILL_DESCENDANTS_ON_EXIT to subreapers but I currently don't see a strong enough reason for this. Jürg
Re: [PATCH] dmaengine: rcar-dmac: Handle hardware descriptor allocation failure
Hi Vinod, On Mon, 2014-12-08 at 17:06 +0530, Vinod Koul wrote: > Can you please rebase this on my next, this fails to apply for me This is a patch for Laurent's new rcar-dmac driver, which doesn't appear to be in your next yet. Laurent has already merged my patch into his dma/next branch and included it in his pull request from November 27. [GIT PULL FOR v3.19] R-Car DMA engine driver http://www.spinics.net/lists/linux-sh/msg37764.html Please let me know if there is still a rebase required from my side. Regards, Jürg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] null_blk: fix blk_mq_init_queue() error handling
blk_mq_init_queue() returns ERR_PTR() on failure, not NULL. Signed-off-by: Jürg Billeter --- drivers/block/null_blk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index ae9f615..727309c 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -530,8 +530,8 @@ static int null_add_dev(void) goto out_cleanup_queues; nullb->q = blk_mq_init_queue(&nullb->tag_set); - if (!nullb->q) { - rv = -ENOMEM; + if (IS_ERR(nullb->q)) { + rv = PTR_ERR(nullb->q); goto out_cleanup_tags; } } else if (queue_mode == NULL_Q_BIO) { -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] virtio_blk: fix blk_mq_init_queue() error handling
blk_mq_init_queue() returns ERR_PTR() on failure, not NULL. Signed-off-by: Jürg Billeter --- drivers/block/virtio_blk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 7ef7c09..b1b5c75 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -638,8 +638,8 @@ static int virtblk_probe(struct virtio_device *vdev) goto out_put_disk; q = vblk->disk->queue = blk_mq_init_queue(&vblk->tag_set); - if (!q) { - err = -ENOMEM; + if (IS_ERR(q)) { + err = PTR_ERR(q); goto out_free_tags; } -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] ti-st: st_kim: fix nshutdown_gpio in get_platform_data
Use of_get_named_gpio instead of of_property_read_u32. Signed-off-by: Jürg Billeter --- drivers/misc/ti-st/st_kim.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/misc/ti-st/st_kim.c b/drivers/misc/ti-st/st_kim.c index af71584..8df8faa 100644 --- a/drivers/misc/ti-st/st_kim.c +++ b/drivers/misc/ti-st/st_kim.c @@ -38,6 +38,7 @@ #include #include #include +#include #define MAX_ST_DEVICES 3 /* Imagine 1 on each UART for now */ static struct platform_device *st_kim_devices[MAX_ST_DEVICES]; @@ -749,7 +750,7 @@ static struct ti_st_plat_data *get_platform_data(struct device *dev) { struct device_node *np = dev->of_node; const u32 *dt_property; - int len; + int len, gpio; dt_pdata = kzalloc(sizeof(*dt_pdata), GFP_KERNEL); @@ -761,8 +762,14 @@ static struct ti_st_plat_data *get_platform_data(struct device *dev) dt_property = of_get_property(np, "dev_name", &len); if (dt_property) memcpy(&dt_pdata->dev_name, dt_property, len); - of_property_read_u32(np, "nshutdown_gpio", -&dt_pdata->nshutdown_gpio); + + gpio = of_get_named_gpio(np, "nshutdown_gpio", 0); + if (gpio < 0) { + kfree(dt_pdata); + return ERR_PTR(gpio); + } + dt_pdata->nshutdown_gpio = gpio; + of_property_read_u32(np, "flow_cntrl", &dt_pdata->flow_cntrl); of_property_read_u32(np, "baud_rate", &dt_pdata->baud_rate); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] ti-st: st_kim: use gpio_set_value_cansleep to fix warning
GPIO accessor functions may sleep. Signed-off-by: Jürg Billeter --- drivers/misc/ti-st/st_kim.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/misc/ti-st/st_kim.c b/drivers/misc/ti-st/st_kim.c index 8df8faa..0ab81d7 100644 --- a/drivers/misc/ti-st/st_kim.c +++ b/drivers/misc/ti-st/st_kim.c @@ -483,9 +483,9 @@ long st_kim_start(void *kim_data) pdata->chip_enable(kim_gdata); /* Configure BT nShutdown to HIGH state */ - gpio_set_value(kim_gdata->nshutdown, GPIO_LOW); + gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_LOW); mdelay(5); /* FIXME: a proper toggle */ - gpio_set_value(kim_gdata->nshutdown, GPIO_HIGH); + gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_HIGH); mdelay(100); /* re-initialize the completion */ reinit_completion(&kim_gdata->ldisc_installed); @@ -567,11 +567,11 @@ long st_kim_stop(void *kim_data) } /* By default configure BT nShutdown to LOW state */ - gpio_set_value(kim_gdata->nshutdown, GPIO_LOW); + gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_LOW); mdelay(1); - gpio_set_value(kim_gdata->nshutdown, GPIO_HIGH); + gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_HIGH); mdelay(1); - gpio_set_value(kim_gdata->nshutdown, GPIO_LOW); + gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_LOW); /* platform specific disable */ if (pdata->chip_disable) -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] ti-st: st_kim: use ERR_PTR(-ENOMEM) instead of NULL
This allows return of other error codes. Signed-off-by: Jürg Billeter --- drivers/misc/ti-st/st_kim.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/misc/ti-st/st_kim.c b/drivers/misc/ti-st/st_kim.c index 5027b8f..af71584 100644 --- a/drivers/misc/ti-st/st_kim.c +++ b/drivers/misc/ti-st/st_kim.c @@ -752,8 +752,11 @@ static struct ti_st_plat_data *get_platform_data(struct device *dev) int len; dt_pdata = kzalloc(sizeof(*dt_pdata), GFP_KERNEL); - if (!dt_pdata) - return NULL; + + if (!dt_pdata) { + pr_err("Can't allocate device_tree platform data\n"); + return ERR_PTR(-ENOMEM); + } dt_property = of_get_property(np, "dev_name", &len); if (dt_property) @@ -773,10 +776,13 @@ static int kim_probe(struct platform_device *pdev) struct ti_st_plat_data *pdata; int err; - if (pdev->dev.of_node) + if (pdev->dev.of_node) { pdata = get_platform_data(&pdev->dev); - else + if (IS_ERR(pdata)) + return PTR_ERR(pdata); + } else { pdata = pdev->dev.platform_data; + } if (pdata == NULL) { dev_err(&pdev->dev, "Platform Data is missing\n"); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] ti-st: st_kim: fix nshutdown_gpio in get_platform_data
On Wed, 2015-06-24 at 11:31 -0400, Peter Hurley wrote: > On 06/24/2015 06:24 AM, Jürg Billeter wrote: > > @@ -761,8 +762,14 @@ static struct ti_st_plat_data > > *get_platform_data(struct device *dev) > > dt_property = of_get_property(np, "dev_name", &len); > > if (dt_property) > > memcpy(&dt_pdata->dev_name, dt_property, len); > > - of_property_read_u32(np, "nshutdown_gpio", > > -&dt_pdata->nshutdown_gpio); > > + > > + gpio = of_get_named_gpio(np, "nshutdown_gpio", 0); > > NAK. This breaks existing dtbs, since the format is not the same. Isn't the existing code completely broken as there is no predictable GPIO numbering in general? There is also no documentation or use of that device tree property in the kernel tree, as far as I can tell. Do you have a suggestion how to fix this without breaking existing dtbs? Do we need to introduce a second property and support both in the driver? Regards, Jürg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] readv.2, io_submit.2: Document RWF_APPEND added in Linux 4.16
Signed-off-by: Jürg Billeter --- man2/io_submit.2 | 10 ++ man2/readv.2 | 10 ++ 2 files changed, 20 insertions(+) diff --git a/man2/io_submit.2 b/man2/io_submit.2 index 397fd0b75..79fcdfee4 100644 --- a/man2/io_submit.2 +++ b/man2/io_submit.2 @@ -111,6 +111,16 @@ field of the .I io_event structure (see .BR io_getevents (2)). +.TP +.BR RWF_APPEND " (since Linux 4.16)" +.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb +Append data to the end of the file. +See the description of the flag of the same name in +.BR pwritev2 (2) +as well as the description of +.B O_APPEND +in +.BR open (2). .RE .TP .I aio_lio_opcode diff --git a/man2/readv.2 b/man2/readv.2 index b3b7b9658..668fba576 100644 --- a/man2/readv.2 +++ b/man2/readv.2 @@ -248,6 +248,16 @@ to .BR EAGAIN . Currently, this flag is meaningful only for .BR preadv2 (). +.TP +.BR RWF_APPEND " (since Linux 4.16)" +.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb +Provide a per-write equivalent of the +.B O_APPEND +.BR open (2) +flag. +This flag is meaningful only for +.BR pwritev2 (), +and its effect applies only to the data range written by the system call. .SH RETURN VALUE On success, .BR readv (), -- 2.16.2
[PATCH] ARM: kernel: respect device tree status of cpu nodes
Skip 'disabled' cpu nodes when building the cpu logical map. This avoids booting cpus that have been disabled in the device tree. Signed-off-by: Jürg Billeter Reviewed-by: Ben Dooks --- arch/arm/kernel/devtree.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c index 739c3df..9aed299 100644 --- a/arch/arm/kernel/devtree.c +++ b/arch/arm/kernel/devtree.c @@ -95,6 +95,10 @@ void __init arm_dt_init_cpu_maps(void) if (of_node_cmp(cpu->type, "cpu")) continue; + /* Check if CPU is enabled */ + if (!of_device_is_available(cpu)) + continue; + pr_debug(" * %s...\n", cpu->full_name); /* * A device tree containing CPU nodes with missing "reg" -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ASoC: rsnd: fixup index of SSI mod when SRC is used
The default index 1 was used as the loop was terminated before the following code could be reached: if (mod[i] == this) index = i; Signed-off-by: Jürg Billeter --- sound/soc/sh/rcar/core.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/sound/soc/sh/rcar/core.c b/sound/soc/sh/rcar/core.c index 4e86265..d737fea 100644 --- a/sound/soc/sh/rcar/core.c +++ b/sound/soc/sh/rcar/core.c @@ -295,9 +295,11 @@ static void rsnd_dma_of_name(struct rsnd_dma *dma, mod[0] = NULL; /* for "mem" */ index = 1; for (i = 1; i < MOD_MAX; i++) { - if (!src) { - mod[i] = ssi; + if (!ssi) { break; + } else if (!src) { + mod[i] = ssi; + ssi = NULL; } else if (!dvc) { mod[i] = src; src = NULL; -- 2.0.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [alsa-devel] [PATCH] ASoC: rsnd: fixup index of SSI mod when SRC is used
Hi, On Thu, 2014-07-03 at 22:57 +0900, Kuninori Morimoto wrote: > If you add this code, > > if (!ssi) { >break; > } else ... > > then, I guess we can remove this code ? > > if (mod[i] == ssi) > break; I checked the issue on mainline master where the above two lines are not present yet. Looking at Mark's for-next tree reveals that you already fixed the issue in a slightly different way (commit c08c3b08). You may disregard my patch. Jürg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] spi: sh-sci: fix use-after-free in sh_sci_spi_remove()
setbits() uses sp->membase. Signed-off-by: Jürg Billeter --- drivers/spi/spi-sh-sci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/spi/spi-sh-sci.c b/drivers/spi/spi-sh-sci.c index 1f56ef6..b83dd73 100644 --- a/drivers/spi/spi-sh-sci.c +++ b/drivers/spi/spi-sh-sci.c @@ -175,9 +175,9 @@ static int sh_sci_spi_remove(struct platform_device *dev) { struct sh_sci_spi *sp = platform_get_drvdata(dev); - iounmap(sp->membase); - setbits(sp, PIN_INIT, 0); spi_bitbang_stop(&sp->bitbang); + setbits(sp, PIN_INIT, 0); + iounmap(sp->membase); spi_master_put(sp->bitbang.master); return 0; } -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] gpio: rcar: clamp returned value to [0,1]
While it will be clamped to bool by gpiolib, let's make this sane in the driver as well. Signed-off-by: Jürg Billeter --- drivers/gpio/gpio-rcar.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-rcar.c b/drivers/gpio/gpio-rcar.c index 0c9f803..7c62121 100644 --- a/drivers/gpio/gpio-rcar.c +++ b/drivers/gpio/gpio-rcar.c @@ -240,9 +240,9 @@ static int gpio_rcar_get(struct gpio_chip *chip, unsigned offset) /* testing on r8a7790 shows that INDT does not show correct pin state * when configured as output, so use OUTDT in case of output pins */ if (gpio_rcar_read(gpio_to_priv(chip), INOUTSEL) & bit) - return (int)(gpio_rcar_read(gpio_to_priv(chip), OUTDT) & bit); + return !!(gpio_rcar_read(gpio_to_priv(chip), OUTDT) & bit); else - return (int)(gpio_rcar_read(gpio_to_priv(chip), INDT) & bit); + return !!(gpio_rcar_read(gpio_to_priv(chip), INDT) & bit); } static void gpio_rcar_set(struct gpio_chip *chip, unsigned offset, int value) -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] dmaengine: rcar-dmac: Handle hardware descriptor allocation failure
If the atomic DMA coherent pool is too small, disable use of hardware descriptor lists instead of crashing the system: ERROR: 256 KiB atomic DMA coherent pool is too small! Please increase it with coherent_pool= kernel parameter! Unable to handle kernel NULL pointer dereference at virtual address 0004 Internal error: Oops: a07 [#1] PREEMPT SMP ARM PC is at rcar_dmac_chan_reinit+0x3c/0x160 LR is at _raw_spin_lock_irqsave+0x18/0x5c [<802132c0>] (rcar_dmac_chan_reinit) from [<80214818>] (rcar_dmac_isr_error+0x84/0xa0) [<80214818>] (rcar_dmac_isr_error) from [<80060484>] (handle_irq_event_percpu+0x50/0x150) [<80060484>] (handle_irq_event_percpu) from [<800605c0>] (handle_irq_event+0x3c/0x5c) [<800605c0>] (handle_irq_event) from [<8006350c>] (handle_fasteoi_irq+0xb8/0x198) [<8006350c>] (handle_fasteoi_irq) from [<8005fdb0>] (generic_handle_irq+0x20/0x30) [<8005fdb0>] (generic_handle_irq) from [<8000fcd0>] (handle_IRQ+0x50/0xc4) [<8000fcd0>] (handle_IRQ) from [<800092cc>] (gic_handle_irq+0x28/0x5c) [<800092cc>] (gic_handle_irq) from [<80012700>] (__irq_svc+0x40/0x70) Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Jürg Billeter --- drivers/dma/sh/rcar-dmac.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c index f8f70eb..f18b1b6 100644 --- a/drivers/dma/sh/rcar-dmac.c +++ b/drivers/dma/sh/rcar-dmac.c @@ -679,8 +679,8 @@ static void rcar_dmac_realloc_hwdesc(struct rcar_dmac_chan *chan, desc->hwdescs.size = size; } -static void rcar_dmac_fill_hwdesc(struct rcar_dmac_chan *chan, - struct rcar_dmac_desc *desc) +static int rcar_dmac_fill_hwdesc(struct rcar_dmac_chan *chan, +struct rcar_dmac_desc *desc) { struct rcar_dmac_xfer_chunk *chunk; struct rcar_dmac_hw_desc *hwdesc; @@ -689,7 +689,7 @@ static void rcar_dmac_fill_hwdesc(struct rcar_dmac_chan *chan, hwdesc = desc->hwdescs.mem; if (!hwdesc) - return; + return -ENOMEM; list_for_each_entry(chunk, &desc->chunks, node) { hwdesc->sar = chunk->src_addr; @@ -697,6 +697,8 @@ static void rcar_dmac_fill_hwdesc(struct rcar_dmac_chan *chan, hwdesc->tcr = chunk->size >> desc->xfer_shift; hwdesc++; } + + return 0; } /* - @@ -933,8 +935,10 @@ rcar_dmac_chan_prep_sg(struct rcar_dmac_chan *chan, struct scatterlist *sgl, * additional complexity remains to be investigated. */ desc->hwdescs.use = !highmem && nchunks > 1; - if (desc->hwdescs.use) - rcar_dmac_fill_hwdesc(chan, desc); + if (desc->hwdescs.use) { + if (rcar_dmac_fill_hwdesc(chan, desc) < 0) + desc->hwdescs.use = false; + } return &desc->async_tx; } -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT
This introduces a new thread group flag that can be set by calling prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0) When a thread group exits with this flag set, it will send SIGKILL to all descendant processes. This can be used to prevent stray child processes. This flag is cleared on privilege gaining execve(2) to ensure an unprivileged process cannot get a privileged process to send SIGKILL. Descendants that are orphaned and reparented to an ancestor of the current process before the current process exits, will not be killed. PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes. If a descendant gained privileges, the current process may not be allowed to kill it, and the descendant process will survive. PR_SET_NO_NEW_PRIVS can be used to prevent descendant processes from gaining privileges. Suggested-by: Oleg Nesterov Signed-off-by: Jürg Billeter Reviewed-by: Oleg Nesterov --- fs/exec.c| 6 ++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/exit.c| 12 kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 3 +++ 7 files changed, 40 insertions(+) diff --git a/fs/exec.c b/fs/exec.c index fb72d36f7823..bbb5a0718223 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1342,6 +1342,12 @@ void setup_new_exec(struct linux_binprm * bprm) /* Make sure parent cannot signal privileged process. */ current->pdeath_signal = 0; + /* +* Do not send SIGKILL from privileged process as it may +* have been requested by an unprivileged process. +*/ + current->signal->kill_descendants_on_exit = false; + /* * For secureexec, reset the stack limit to sane default to * avoid bad behavior from the prior rlimits. This has to diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 13789d10a50e..2acf481951f6 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -124,6 +124,9 @@ struct signal_struct { unsigned intis_child_subreaper:1; unsigned inthas_child_subreaper:1; + /* Send SIGKILL to descendant processes on exit */ + boolkill_descendants_on_exit; + #ifdef CONFIG_POSIX_TIMERS /* POSIX.1b Interval Timers */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index b4875a93363a..d5483ca63c2d 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -198,6 +198,10 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* Send SIGKILL to descendant processes on exit */ +#define PR_SET_KILL_DESCENDANTS_ON_EXIT48 +#define PR_GET_KILL_DESCENDANTS_ON_EXIT49 + /* arm64 Scalable Vector Extension controls */ /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */ #define PR_SVE_SET_VL 50 /* set task vector length */ diff --git a/kernel/exit.c b/kernel/exit.c index 2d14979577ee..93a812c1b670 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -694,6 +694,15 @@ static void forget_original_parent(struct task_struct *father, list_splice_tail_init(&father->children, &reaper->children); } +static int kill_descendant_visitor(struct task_struct *p, void *data) +{ + /* This may fail, e.g., when a descendant process gained privileges. */ + group_send_sig_info(SIGKILL, SEND_SIG_NOINFO, p, PIDTYPE_TGID); + + /* Always continue walking the process tree. */ + return 1; +} + /* * Send signals to all our closest relatives so that they know * to properly mourn us.. @@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead) struct task_struct *p, *n; LIST_HEAD(dead); + if (group_dead && tsk->signal->kill_descendants_on_exit) + walk_process_tree(tsk, kill_descendant_visitor, NULL); + write_lock_irq(&tasklist_lock); forget_original_parent(tsk, &dead); diff --git a/kernel/sys.c b/kernel/sys.c index f7eb62eceb24..f6dba0ba9b77 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2485,6 +2485,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, return -EINVAL; error = PAC_RESET_KEYS(me, arg2); break; + case PR_SET_KILL_DESCENDANTS_ON_EXIT: + if (arg3 || arg4 || arg5) + return -EINVAL; + me->signal->kill_descendants_on_exit = !!arg2; + break; + case PR_GET_KILL_DESCENDANTS_ON_EXIT: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(me->signal->kill_descenda
[RESEND PATCH v2 0/1] Add prctl to kill descendants on exit
This patch adds a new prctl to kill all descendant processes on exit. See commit message for details of the prctl. This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1]. In the following discussion, Oleg suggested this approach. The motivation for this is to provide a lightweight mechanism to prevent stray processes. There is also a related Bugzilla entry [2]. PID namespaces can also be used to prevent stray processes, of course. However, they are not quite as lightweight as they typically also require a new mount namespace to be able to mount a new /proc. And they require CAP_SYS_ADMIN. User namespaces can help to gain CAP_SYS_ADMIN, however, that further increases the overhead and the other effects of the user namespace may not be desired. PID 1 in PID namespaces also exhibits non-standard signal behavior (SIGNAL_UNKILLABLE) [3]. Changes in v2: - Use bool instead of bitfield to avoid race with PR_SET_CHILD_SUBREAPER [1] https://lkml.kernel.org/lkml/20170929123058.48924-...@bitron.ch/ [2] https://bugzilla.kernel.org/show_bug.cgi?id=43300 [3] https://lkml.kernel.org/lkml/20180803144021.56920-...@bitron.ch/ Jürg Billeter (1): prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT fs/exec.c| 6 ++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/exit.c| 12 kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 3 +++ 7 files changed, 40 insertions(+) -- 2.20.1
Re: [PATCH v2 2/5] pid: add pidfd_open()
On Fri, 2019-03-29 at 16:54 +0100, Christian Brauner wrote: > diff --git a/include/uapi/linux/wait.h b/include/uapi/linux/wait.h > index ac49a220cf2a..d6c7c0701997 100644 > --- a/include/uapi/linux/wait.h > +++ b/include/uapi/linux/wait.h > @@ -18,5 +18,7 @@ > #define P_PID1 > #define P_PGID 2 > > +/* Get a file descriptor for /proc/ of the corresponding pidfd > */ > +#define PIDFD_GET_PROCFD _IOR('p', 1, int) > > #endif /* _UAPI_LINUX_WAIT_H */ This is missing an entry in Documentation/ioctl/ioctl-number.txt and is actually conflicting with existing entries. However, I'd actually prefer a syscall to allow strict whitelisting via seccomp and avoid the other ioctl disadvantages that Daniel has already mentioned. Cheers, Jürg
[PATCH 2/2] iwlwifi: fix double hyphen in MODULE_FIRMWARE for 8000
There is only a single hyphen between iwlwifi-8000C and the API version in the firmware filename. Fixes: e1ba684f762b ("iwlwifi: 8000: fix MODULE_FIRMWARE input") Signed-off-by: Jürg Billeter --- drivers/net/wireless/intel/iwlwifi/iwl-8000.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-8000.c b/drivers/net/wireless/intel/iwlwifi/iwl-8000.c index d02ca14..8d3e53f 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-8000.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-8000.c @@ -91,7 +91,7 @@ #define IWL8000_FW_PRE "iwlwifi-8000C-" #define IWL8000_MODULE_FIRMWARE(api) \ - IWL8000_FW_PRE "-" __stringify(api) ".ucode" + IWL8000_FW_PRE __stringify(api) ".ucode" #define IWL8265_FW_PRE "iwlwifi-8265-" #define IWL8265_MODULE_FIRMWARE(api) \ -- 2.10.1
[PATCH 1/2] iwlwifi: fix MODULE_FIRMWARE for 6030
IWL6000G2B_UCODE_API_MAX is not defined. ucode_api_max of IWL_DEVICE_6030 uses IWL6000G2_UCODE_API_MAX. Use this also for MODULE_FIRMWARE. Fixes: 9d9b21d1b616 ("iwlwifi: remove IWL_*_UCODE_API_OK") Signed-off-by: Jürg Billeter --- drivers/net/wireless/intel/iwlwifi/iwl-6000.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-6000.c b/drivers/net/wireless/intel/iwlwifi/iwl-6000.c index 0b9f6a7..39335b7 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-6000.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-6000.c @@ -371,4 +371,4 @@ const struct iwl_cfg iwl6000_3agn_cfg = { MODULE_FIRMWARE(IWL6000_MODULE_FIRMWARE(IWL6000_UCODE_API_MAX)); MODULE_FIRMWARE(IWL6050_MODULE_FIRMWARE(IWL6050_UCODE_API_MAX)); MODULE_FIRMWARE(IWL6005_MODULE_FIRMWARE(IWL6000G2_UCODE_API_MAX)); -MODULE_FIRMWARE(IWL6030_MODULE_FIRMWARE(IWL6000G2B_UCODE_API_MAX)); +MODULE_FIRMWARE(IWL6030_MODULE_FIRMWARE(IWL6000G2_UCODE_API_MAX)); -- 2.10.1
Re: [PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
Hi Oleg, Thanks for the review. On Tue, 2017-09-12 at 19:05 +0200, Oleg Nesterov wrote: > On 09/09, Jürg Billeter wrote: > > Unlike > > PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole > > subtree without race conditions. > > but I am still not sure this is right... at least I can't understand the > "without race conditions" above. > > IOW, the child can do prctl(PR_SET_PDEATHSIG_PROC, SIGKILL) right after > fork(), > why this is not enough to kill a whole subtree without race conditions? What if the parent dies between fork() and prctl()? Besides avoiding this race condition, it also makes it relatively easy to enforce PDEATHSIG_PROC for all descendants of a process. You simply set PDEATHSIG_PROC and then block further changes using seccomp (and set no_new_privs) to avoid runaway children. > OTOH. If you want to kill a whole sub-tree then perhaps the exiting process > should simply send the ->pdeath_signal_proc to the whole sub-tree? Not that > I really think this makes more sense, but if we add the new API we should > discuss everything we can. While this would likely work for my use case of avoiding runaway processes, I don't think it would make sense for non-SIGKILL use cases of cooperating processes. Inheritance across fork still allows resetting PDEATHSIG_PROC in the child after fork and I don't expect the parent death race to be a significant issue in the case of cooperating processes. > Say, CLONE_PARENT. Should it succeed if ->pdeath_signal_proc != 0 ? Yes, I don't see an issue with that. The new process will be a sibling and inheriting pdeath_signal_proc seems sensible to me for this. Jürg
Re: [PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
On Wed, 2017-09-13 at 19:11 +0200, Oleg Nesterov wrote: > On 09/12, Jürg Billeter wrote: > > > > On Tue, 2017-09-12 at 19:05 +0200, Oleg Nesterov wrote: > > > On 09/09, Jürg Billeter wrote: > > > > Unlike > > > > PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole > > > > subtree without race conditions. > > > > > > but I am still not sure this is right... at least I can't understand the > > > "without race conditions" above. > > > > > > IOW, the child can do prctl(PR_SET_PDEATHSIG_PROC, SIGKILL) right after > > > fork(), > > > why this is not enough to kill a whole subtree without race conditions? > > > > What if the parent dies between fork() and prctl()? > > The child will be killed? Sorry, can't understand... If PR_SET_PDEATHSIG_PROC was not inherited across fork and the parent died between fork() and prctl(PR_SET_PDEATHSIG_PROC, SIGKILL) in the child, the child would not be killed. It would be reparented to init(1) or a subreaper, i.e., you end up with a runaway process. It would be possible to safe guard against this race condition in other ways but inheriting the setting avoids it nicely, and makes it easy to apply/enforce PDEATHSIG_PROC for all descendants. > > > Say, CLONE_PARENT. Should it succeed if ->pdeath_signal_proc != 0 ? > > > > Yes, I don't see an issue with that. The new process will be a sibling > > and inheriting pdeath_signal_proc seems sensible to me for this. > > I meant, the process created by clone(CLONE_PARENT) won't be killed by > pdeath_signal if the creator process exits, exactly because it won't be > its child. Not that I think this is wrong. Right, creator and parent won't be the same. Jürg
[PATCH] fs: add RWF_APPEND
This is the per-I/O equivalent of O_APPEND to support atomic append operations on any open file. On Linux, if a file is opened with O_APPEND, pwrite() ignores the offset and always appends data to the end of the file. RWF_APPEND enables atomic append and pwrite() with offset on a single file descriptor. Signed-off-by: Jürg Billeter --- include/linux/fs.h | 2 ++ include/uapi/linux/fs.h | 6 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index e723b62c4b81..5f27ef9cce17 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3187,6 +3187,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) ki->ki_flags |= IOCB_DSYNC; if (flags & RWF_SYNC) ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC); + if (flags & RWF_APPEND) + ki->ki_flags |= IOCB_APPEND; return 0; } diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index ea84938da405..5477480d0502 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -330,7 +330,11 @@ typedef int __bitwise __kernel_rwf_t; /* per-IO, return -EAGAIN if operation would block */ #define RWF_NOWAIT ((__force __kernel_rwf_t)0x0008) +/* per-IO O_APPEND */ +#define RWF_APPEND ((__force __kernel_rwf_t)0x0010) + /* mask of flags supported by the kernel */ -#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT) +#define RWF_SUPPORTED \ + (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT | RWF_APPEND) #endif /* _UAPI_LINUX_FS_H */ -- 2.14.1
[PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
PR_SET_PDEATHSIG sets a parent death signal that the calling process will get when its parent thread dies, even when the result of getppid() doesn't change because the calling process is reparented to a different thread in the same parent process. When managing multiple processes, a process-based parent death signal is much more useful. E.g., to avoid stray child processes. PR_SET_PDEATHSIG_PROC sets a process-based death signal. Unlike PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole subtree without race conditions. This can be used for sandboxing when combined with a seccomp filter. There have been previous attempts to support this by changing the behavior of PR_SET_PDEATHSIG. However, that would break existing applications. See https://marc.info/?l=linux-kernel&m=117621804801689 and https://bugzilla.kernel.org/show_bug.cgi?id=43300 Signed-off-by: Jürg Billeter --- fs/exec.c| 1 + include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/cred.c| 1 + kernel/exit.c| 4 kernel/fork.c| 2 ++ kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 1 + 9 files changed, 28 insertions(+) diff --git a/fs/exec.c b/fs/exec.c index 01a9fb9d8ac3..bb389c3c596d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1353,6 +1353,7 @@ void setup_new_exec(struct linux_binprm * bprm) if (bprm->secureexec) { /* Make sure parent cannot signal privileged process. */ current->pdeath_signal = 0; + current->signal->pdeath_signal_proc = 0; /* * For secureexec, reset the stack limit to sane default to diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 2a0dd40b15db..c5c137e5ef39 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -103,6 +103,9 @@ struct signal_struct { int group_stop_count; unsigned intflags; /* see SIGNAL_* flags below */ + /* The signal sent when the parent dies: */ + int pdeath_signal_proc; + /* * PR_SET_CHILD_SUBREAPER marks a process, like a service * manager, to re-parent orphan (double-forking) child processes diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a8d0759a9e40..04508e81d4f2 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -197,4 +197,8 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* Process-based variant of PDEATHSIG */ +#define PR_SET_PDEATHSIG_PROC 48 +#define PR_GET_PDEATHSIG_PROC 49 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/cred.c b/kernel/cred.c index ecf03657e71c..0192a94670e1 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -448,6 +448,7 @@ int commit_creds(struct cred *new) if (task->mm) set_dumpable(task->mm, suid_dumpable); task->pdeath_signal = 0; + task->signal->pdeath_signal_proc = 0; smp_wmb(); } diff --git a/kernel/exit.c b/kernel/exit.c index a35d8a17e01f..1be0616239e0 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -635,6 +635,10 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p, if (unlikely(p->exit_state == EXIT_DEAD)) return; + if (p->signal->pdeath_signal_proc) + group_send_sig_info(p->signal->pdeath_signal_proc, + SEND_SIG_NOINFO, p); + /* We don't want people slaying init. */ p->exit_signal = SIGCHLD; diff --git a/kernel/fork.c b/kernel/fork.c index 24a4c0be80d5..f6482392ece9 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1412,6 +1412,8 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) mutex_init(&sig->cred_guard_mutex); + sig->pdeath_signal_proc = current->signal->pdeath_signal_proc; + return 0; } diff --git a/kernel/sys.c b/kernel/sys.c index 2855ee73acd0..c47e92fa5370 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2210,6 +2210,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_GET_PDEATHSIG: error = put_user(me->pdeath_signal, (int __user *)arg2); break; + case PR_SET_PDEATHSIG_PROC: + if (!valid_signal(arg2)) { + error = -EINVAL; + break; + } + me->signal->pdeath_signal_proc = arg2; + break; + case PR_GET_PDEATHSIG_PROC: + error = put_user(me->signal->pdeath_signal_proc, +(int
Re: new ...at() flag: AT_NO_JUMPS
Hi Al, Might it make sense to specify these lookup restrictions when opening the directory (O_ROOT?) instead of specifying it for each lookup with AT_* (or supporting both)? This might make it more useful when passing directory fds between processes that do not use seccomp (where AT_BENEATH could be enforced). For my sandboxing use case, I'd be happy with either solution, though. Is there anything I can do to help move this forward? Best regards, Jürg
[PATCH] Bluetooth: btintel: Add MODULE_FIRMWARE entries for iBT 3.5 controllers
The iBT 3.5 controllers (Intel 8265, Windstorm Peak) need intel/ibt-12-16.sfi and intel/ibt-12-16.ddc firmware files from linux-firmware repository. Signed-off-by: Jürg Billeter --- drivers/bluetooth/btintel.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c index fce154855718..d32e109bd5cb 100644 --- a/drivers/bluetooth/btintel.c +++ b/drivers/bluetooth/btintel.c @@ -575,3 +575,5 @@ MODULE_VERSION(VERSION); MODULE_LICENSE("GPL"); MODULE_FIRMWARE("intel/ibt-11-5.sfi"); MODULE_FIRMWARE("intel/ibt-11-5.ddc"); +MODULE_FIRMWARE("intel/ibt-12-16.sfi"); +MODULE_FIRMWARE("intel/ibt-12-16.ddc"); -- 2.13.0
Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
On Mon, 2017-10-02 at 22:25 -0500, Eric W. Biederman wrote: > The code where it calls group_send_sig_info is buggy for pdeath_signal. > And it no less buggy for this new case. There is no point to check > permissions when sending a signal to yourself. Especially this signal > gets cleared during exec with a change of permissions. > > > I would recommend using: > do_send_sig_info(p->signal->pdeath_signal_proc, SEND_SIG_NOINFO, p, true); > > Perhaps with a comment saying that no permission check is needed when > sending a signal to yourself. Depending on how you look at it, one could also argue that the dying parent sends the signal. However, I'm fine with dropping the permission check in v2. I'll also send a patch to change this for the existing pdeath_signal. > I don't know what I think about inherit over fork, and the whole tree > killing thing. Except when the signal is SIGKILL I don't know if that > code does what is intended. So I am a little leary of it. I agree that inheritance across fork is mainly useful for SIGKILL. While non-SIGKILL users could clear the setting after fork(), another option would be to allow the caller to specify whether the setting should be inherited using prctl arg3. This would allow both, the exact process-based equivalent to pdeath_signal (no inheritance) as well as the interesting SIGKILL case for killing a process tree. Does this sound sensible? I'd be happy to add this to v2. Jürg
Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
On Tue, 2017-10-03 at 09:46 -0500, Eric W. Biederman wrote: > There is a general need to find out about the death of other processes, > if you are not the parent of the process. I would be inclined to call > it waitfd. Something that you give a pid. It performs a permission > check and the pid becomes readable when the process dies. With poll > working on the fd, and the fd returning wstatus of the dead child. > > Support SIGIO on the fd and you have a signal delivery mechanism, > if you want it. File descriptors for processes (waitfd/clonefd) are definitely interesting. Especially if reaping the process (and reparenting its children) is delayed until the last process file descriptor is closed. However, this would be a much larger addition and also less intuitive to use if all you want is killing the process tree. > For the kill all children when the parent dies the mechanism you are > proposing is escapable. We already have an inescapable version of it > with init in a pid namespace. We already have an escapable version of > it with orphaned process groups and SIGHUP. > > So I would really appreciate a very clear use case for what we are > building here. As it appears the killing of children can already be > done another way, and that the waiting for the parent can be done better > another way. My use case is to provide a way for a process to spawn a child and ensure that no descendants survive when that child dies. Avoiding runaway processes is desirable in many situations. My motivation is very lightweight (nested) sandboxing (every process is potentially sandboxed). I.e., pid namespaces would be a pretty good fit (assuming they are sufficiently lightweight) but CLONE_NEWPID requires CAP_SYS_ADMIN. User namespaces can help here, but creating tons of user namespaces just for this doesn't sound sensible. MAX_PID_NS_LEVEL could be an issue as well at some point but 32 levels are likely fine in practice. For my particular scenario I may actually be able to create a single user namespace, run all processes with (namespaced) CAP_SYS_ADMIN and use CLONE_NEWPID for every process. However, I would prefer not requiring CAP_SYS_ADMIN and a regular application that wants to avoid runaway processes for a spawned helper process cannot rely on CAP_SYS_ADMIN. My plan was to use PR_SET_PDEATHSIG_PROC with PR_NO_NEW_PRIVS and a suitable seccomp filter to prevent changes to pdeath_signal_proc. For my SIGKILL use case it would be even better to simply require PR_NO_NEW_PRIVS and make pdeath_signal_proc sticky, avoiding the need for seccomp. I wanted to keep the differences to the existing PR_SET_PDEATHSIG minimal but if we argue that the non-SIGKILL use case is better solved with waitfd (or maybe the process events connector), we could tailor the prctl for the SIGKILL use case (or support both via prctl arg3). I have another small patch locally that adds a prctl that restricts kill(2) to direct children of the current thread group for lightweight sandboxing. That would also be redundant if it was possible to use CLONE_NEWPID for every process. What's actually the reason that CLONE_NEWPID requires CAP_SYS_ADMIN? Does CLONE_NEWPID pose any risks that don't exist for CLONE_NEWUSER|CLONE_NEWPID? Assuming we can't simply drop the CAP_SYS_ADMIN requirement, do you see a better solution for this use case? Jürg
Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
On Tue, 2017-10-03 at 12:40 -0500, Eric W. Biederman wrote: > Jürg Billeter writes: > > What's actually the reason that CLONE_NEWPID requires CAP_SYS_ADMIN? > > Does CLONE_NEWPID pose any risks that don't exist for > > CLONE_NEWUSER|CLONE_NEWPID? Assuming we can't simply drop the > > CAP_SYS_ADMIN requirement, do you see a better solution for this use > > case? > > CLONE_NEWPID without a permission check would allow runing a setuid root > application in a pid namespace. Off the top of my head I can't think of > a really good exploit. But when you mess up pid files, and hide > information from a privileged application I can completely imagine > forcing that application to misbehave in ways the attacker can control. > Leading to bad things. Could we allow unprivileged CLONE_NEWPID if the no_new_privs bit is set? Jürg
[RESEND PATCH] fs: add RWF_APPEND
This is the per-I/O equivalent of O_APPEND to support atomic append operations on any open file. If a file is opened with O_APPEND, pwrite() ignores the offset and always appends data to the end of the file. RWF_APPEND enables atomic append and pwrite() with offset on a single file descriptor. Signed-off-by: Jürg Billeter --- include/linux/fs.h | 2 ++ include/uapi/linux/fs.h | 6 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 339e73742e73..fee24eae7523 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3204,6 +3204,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) ki->ki_flags |= IOCB_DSYNC; if (flags & RWF_SYNC) ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC); + if (flags & RWF_APPEND) + ki->ki_flags |= IOCB_APPEND; return 0; } diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 56235dddea7d..ac145430bcd8 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -376,7 +376,11 @@ typedef int __bitwise __kernel_rwf_t; /* per-IO, return -EAGAIN if operation would block */ #define RWF_NOWAIT ((__force __kernel_rwf_t)0x0008) +/* per-IO O_APPEND */ +#define RWF_APPEND ((__force __kernel_rwf_t)0x0010) + /* mask of flags supported by the kernel */ -#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT) +#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ +RWF_APPEND) #endif /* _UAPI_LINUX_FS_H */ -- 2.14.1
[RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
PR_SET_PDEATHSIG sets a parent death signal that the calling process will get when its parent thread dies, even when the result of getppid() doesn't change because the calling process is reparented to a different thread in the same parent process. When managing multiple processes, a process-based parent death signal is much more useful. E.g., to avoid stray child processes. PR_SET_PDEATHSIG_PROC sets a process-based death signal. Unlike PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole subtree without race conditions. This can be used for sandboxing when combined with a seccomp filter. There have been previous attempts to support this by changing the behavior of PR_SET_PDEATHSIG. However, that would break existing applications. See https://marc.info/?l=linux-kernel&m=117621804801689 and https://bugzilla.kernel.org/show_bug.cgi?id=43300 Signed-off-by: Jürg Billeter --- Previous discussion: https://patchwork.kernel.org/patch/9945315/ fs/exec.c| 1 + include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/cred.c| 1 + kernel/exit.c| 4 kernel/fork.c| 2 ++ kernel/sys.c | 11 +++ security/apparmor/lsm.c | 1 + security/selinux/hooks.c | 1 + 9 files changed, 28 insertions(+) diff --git a/fs/exec.c b/fs/exec.c index ac34d9724684..7045f0223140 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1334,6 +1334,7 @@ void setup_new_exec(struct linux_binprm * bprm) if (bprm->secureexec) { /* Make sure parent cannot signal privileged process. */ current->pdeath_signal = 0; + current->signal->pdeath_signal_proc = 0; /* * For secureexec, reset the stack limit to sane default to diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 2a0dd40b15db..c5c137e5ef39 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -103,6 +103,9 @@ struct signal_struct { int group_stop_count; unsigned intflags; /* see SIGNAL_* flags below */ + /* The signal sent when the parent dies: */ + int pdeath_signal_proc; + /* * PR_SET_CHILD_SUBREAPER marks a process, like a service * manager, to re-parent orphan (double-forking) child processes diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a8d0759a9e40..04508e81d4f2 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -197,4 +197,8 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* Process-based variant of PDEATHSIG */ +#define PR_SET_PDEATHSIG_PROC 48 +#define PR_GET_PDEATHSIG_PROC 49 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/cred.c b/kernel/cred.c index ecf03657e71c..0192a94670e1 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -448,6 +448,7 @@ int commit_creds(struct cred *new) if (task->mm) set_dumpable(task->mm, suid_dumpable); task->pdeath_signal = 0; + task->signal->pdeath_signal_proc = 0; smp_wmb(); } diff --git a/kernel/exit.c b/kernel/exit.c index 3481ababd06a..9b6fbb0128d7 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -635,6 +635,10 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p, if (unlikely(p->exit_state == EXIT_DEAD)) return; + if (p->signal->pdeath_signal_proc) + group_send_sig_info(p->signal->pdeath_signal_proc, + SEND_SIG_NOINFO, p); + /* We don't want people slaying init. */ p->exit_signal = SIGCHLD; diff --git a/kernel/fork.c b/kernel/fork.c index 10646182440f..264936c367e3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1415,6 +1415,8 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) mutex_init(&sig->cred_guard_mutex); + sig->pdeath_signal_proc = current->signal->pdeath_signal_proc; + return 0; } diff --git a/kernel/sys.c b/kernel/sys.c index 9aebc2935013..dcb9a535404e 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2206,6 +2206,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_GET_PDEATHSIG: error = put_user(me->pdeath_signal, (int __user *)arg2); break; + case PR_SET_PDEATHSIG_PROC: + if (!valid_signal(arg2)) { + error = -EINVAL; + break; + } + me->signal->pdeath_signal_proc = arg2; + break; + case PR_GET_PDEATHSIG_PROC: + error = put_user
Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
On Thu, 2017-10-05 at 18:27 +0200, Oleg Nesterov wrote: > On 10/03, Jürg Billeter wrote: > > > > My use case is to provide a way for a process to spawn a child and > > ensure that no descendants survive when that child dies. Avoiding > > runaway processes is desirable in many situations. My motivation is > > very lightweight (nested) sandboxing (every process is potentially > > sandboxed). > > > > I.e., pid namespaces would be a pretty good fit (assuming they are > > sufficiently lightweight) but CLONE_NEWPID > > sorry if this was already discussed, I didn't read this thread yet... > > if CLONE_NEWPID is not suitable for any reason. We already have > PR_SET_CHILD_SUBREAPER. Perhaps we can simply add another > PR_SET_KILL_ALL_DESCEDANTS_ON_EXIT? we can use walk_process_tree() > to send SIGKILL. Yes, this is an option. However, after the discussion in this thread I believe it would be better to drop the CAP_SYS_ADMIN requirement for CLONE_NEWPID (when no_new_privs is set) as this would avoid adding another API and code path for a similar effect. I'm interested in possible security concerns about such a change. Adding Andy Lutomirski to cc. Jürg