from:"Jürg Billeter"

Re: [PATCH v2] prctl: add PR_[GS]ET_KILLABLE

2018-08-03 Thread Jürg Billeter

On Wed, 2018-08-01 at 16:19 +0200, Oleg Nesterov wrote:
> On 07/31, Jürg Billeter wrote:
> > 
> > > Could you explain your use-case? Why a shell wants to use
> > > CLONE_NEWPID?
> > 
> > To guarantee that there won't be any runaway processes, i.e., ensure
> > that no descendants (background helper daemons or misbehaving
> > processes) survive when the child process is terminated.
> 
> We already have PR_SET_CHILD_SUBREAPER.
> 
> Perhaps we can finally add PR_KILL_MY_DESCENDANTS_ON_EXIT? This was already
> discussed some time ago, but I can't find the previous discussion... Simple
> to implement.

This would definitely be an option. You mentioned it last October in
the PR_SET_PDEATHSIG_PROC discussion¹. However, as PID namespaces
already exist and appear to be a good fit for the most part, I think it
makes sense to just add the missing pieces to PID namespaces instead of
duplicating part of the PID namespace functionality.

Also, based on Eric's comment in that other discussion about
no_new_privs not being allowed to increase the attack surface,
PR_KILL_MY_DESCENDANTS_ON_EXIT might require CAP_SYS_ADMIN as well (due
to setuid children). In which case the only potential benefit would be
that it still allows the child to kill arbitrary processes, as far as I
can tell.

> > And to prevent children from killing their ancestors.
> 
> OK, this is the only reason for CLONE_NEWPID which I can understand so far.
> Not that I understand why this is that useful ;)

The overall goal is increasing isolation between (some) child processes
and the rest of the system. Isolation from runaway processes and
isolation from signals are independent aspects and it could be useful
to control them independently. However, I also expect it to be common
that both are wanted at the same time.

Jürg

¹ https://lkml.org/lkml/2017/10/5/546

Re: [PATCH v2] prctl: add PR_[GS]ET_KILLABLE

2018-08-03 Thread Jürg Billeter

On Fri, 2018-08-03 at 08:34 -0500, Eric W. Biederman wrote:
> From the other direction I think we can just go ahead and fix handling
> of the job control stop signals as well.  As far as I understand it
> there is a legitimate complaint that SIGTSTP SIGTTIN SIGTTOU do not work
> on a pid namespace leader.
> 
> The current implementation actual overshoots.  We only need to ignore
> signals from the descendants in the pid namespace.  Ideally signals from
> other processes are treated like normal.  We have only been able to
> apply that ideal to SIGSTOP and SIGKILL as we can handle them in
> prepare_signal.  Other signals can be blocked which means the logic to
> handle them needs to live in get_signal where we may have no sender
> information.

SIGINT and SIGQUIT are also relevant for job control. Would the same
approach be possible for them?

And I would like to allow regular POSIX signal behavior also for
signals used outside job control, e.g., SIGTERM, for maximum
compatibility with existing applications. Furthermore, it would also be
good to allow a PID namespace leader to send a signal to itself.

Do you think we can and should cover all of the above without a prctl
by loosening the restrictions imposed by SIGNAL_UNKILLABLE (with
reasonable effort)?

In my opinion, my patch still makes sense as it simply allows regular
POSIX signal behavior for PID namespace leaders and it doesn't risk any
compatibility issues as the behavior doesn't change at all for
processes that don't invoke the new prctl. I.e., simple patch, low
risk, and covers all signals.

In the meantime I've tested the missing patch for copy_process() and
will send out v3 of the patch in case the new prctl makes sense after
all.

Jürg

[PATCH v3 2/2] prctl: add PR_[GS]ET_KILLABLE

2018-08-03 Thread Jürg Billeter

PR_SET_KILLABLE clears the SIGNAL_UNKILLABLE flag. This allows
CLONE_NEWPID tasks to restore normal signal behavior, opting out of the
special signal protection for init processes. This prctl does not allow
setting the SIGNAL_UNKILLABLE flag, only clearing.

The SIGNAL_UNKILLABLE flag, which is implicitly set for tasks cloned
with CLONE_NEWPID, has the effect of ignoring all signals (from
userspace) if the corresponding handler is set to SIG_DFL. The only
exceptions are SIGKILL and SIGSTOP and they are only accepted if raised
from an ancestor namespace.

SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z.
While a task with the SIGNAL_UNKILLABLE flag could install handlers for
these signals, this is not sufficient to implement a shell that uses
CLONE_NEWPID for child processes:

 * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process
   itself, it's not possible to implement the stop action in a custom
   SIGTSTP handler.
 * Many applications do not install handlers for these signals and
   thus, job control won't work properly with unmodified applications.

There are other scenarios besides job control in a shell where
applications rely on the default actions as described in signal(7) and
PID isolation may be useful. This new prctl makes the signal protection
for "init" processes optional, without breaking backward compatibility.

Signed-off-by: Jürg Billeter 
---
 include/uapi/linux/prctl.h |  4 
 kernel/sys.c   | 13 +
 2 files changed, 17 insertions(+)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c0d7ea0bf5b6..92afb63da727 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -219,4 +219,8 @@ struct prctl_mm_map {
 # define PR_SPEC_DISABLE   (1UL << 2)
 # define PR_SPEC_FORCE_DISABLE (1UL << 3)
 
+/* Control SIGNAL_UNKILLABLE */
+#define PR_GET_KILLABLE54
+#define PR_SET_KILLABLE55
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 38509dc1f77b..92c9322cfb98 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2484,6 +2484,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
return -EINVAL;
error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
break;
+   case PR_GET_KILLABLE:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = put_user(!(me->signal->flags & SIGNAL_UNKILLABLE),
+(int __user *)arg2);
+   break;
+   case PR_SET_KILLABLE:
+   if (arg2 != 1 || arg3 || arg4 || arg5)
+   return -EINVAL;
+   spin_lock_irq(&me->sighand->siglock);
+   me->signal->flags &= ~SIGNAL_UNKILLABLE;
+   spin_unlock_irq(&me->sighand->siglock);
+   break;
default:
error = -EINVAL;
break;
-- 
2.18.0

[PATCH v3 1/2] fork: do not rely on SIGNAL_UNKILLABLE for init check

2018-08-03 Thread Jürg Billeter

copy_process() currently checks the SIGNAL_UNKILLABLE flag to determine
whether to accept CLONE_PARENT. In preparation for allowing init
processes to opt out of SIGNAL_UNKILLABLE, directly check whether the
process is an init process with is_child_reaper().

Signed-off-by: Jürg Billeter 
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 1b27babc4c78..c019ce461556 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1646,7 +1646,7 @@ static __latent_entropy struct task_struct *copy_process(
 * from creating siblings.
 */
if ((clone_flags & CLONE_PARENT) &&
-   current->signal->flags & SIGNAL_UNKILLABLE)
+   is_child_reaper(task_tgid(current)))
return ERR_PTR(-EINVAL);
 
/*
-- 
2.18.0

[PATCH] prctl: add PR_[GS]ET_KILLABLE

2018-07-30 Thread Jürg Billeter

PR_SET_KILLABLE clears the SIGNAL_UNKILLABLE flag. This allows
CLONE_NEWPID tasks to restore normal signal behavior, opting out of the
special signal protection for init processes.

This is required for job control in a shell that uses CLONE_NEWPID for
child processes.

This prctl does not allow setting the SIGNAL_UNKILLABLE flag, only
clearing.

Signed-off-by: Jürg Billeter 
---
 include/uapi/linux/prctl.h |  4 
 kernel/sys.c   | 11 +++
 2 files changed, 15 insertions(+)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c0d7ea0bf5b6..92afb63da727 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -219,4 +219,8 @@ struct prctl_mm_map {
 # define PR_SPEC_DISABLE   (1UL << 2)
 # define PR_SPEC_FORCE_DISABLE (1UL << 3)
 
+/* Control SIGNAL_UNKILLABLE */
+#define PR_GET_KILLABLE54
+#define PR_SET_KILLABLE55
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 38509dc1f77b..264de630d548 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2484,6 +2484,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
return -EINVAL;
error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
break;
+   case PR_GET_KILLABLE:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = put_user(!(me->signal->flags & SIGNAL_UNKILLABLE),
+(int __user *)arg2);
+   break;
+   case PR_SET_KILLABLE:
+   if (arg2 != 1 || arg3 || arg4 || arg5)
+   return -EINVAL;
+   me->signal->flags &= ~SIGNAL_UNKILLABLE;
+   break;
default:
error = -EINVAL;
break;
-- 
2.18.0

Re: [PATCH] prctl: add PR_[GS]ET_KILLABLE

2018-07-30 Thread Jürg Billeter

On Mon, 2018-07-30 at 12:17 +0200, Oleg Nesterov wrote:
> On 07/30, Jürg Billeter wrote:
> > 
> > This is required for job control in a shell that uses CLONE_NEWPID for
> > child processes.
> 
> Could you explain in more details?

The SIGNAL_UNKILLABLE flag, which is implicitly set for tasks cloned
with CLONE_NEWPID, has the effect of ignoring all signals (from
userspace) if the corresponding handler is set to SIG_DFL. The only
exceptions are SIGKILL and SIGSTOP and they are only accepted if raised
from an ancestor namespace.

SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z.
While a task with the SIGNAL_UNKILLABLE flag could install handlers for
these signals, this is not sufficient to implement a shell that uses
CLONE_NEWPID for child processes:

 * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process
   itself, I don't think it's possible to implement the stop action in
   a custom SIGTSTP handler.
 * Many applications do not install handlers for these signals and
   thus, job control won't work properly with unmodified applications.

Job control in a shell is just an example. There are other scenarios,
of course, where applications rely on the default actions as described
in signal(7), and PID isolation may be useful. In my opinion, the
kernel support for preventing accidental killing of the "init" process
should really be optional and this new prctl provides this without
breaking backward compatibility.

> > +   case PR_SET_KILLABLE:
> > +   if (arg2 != 1 || arg3 || arg4 || arg5)
> > +   return -EINVAL;
> > +   me->signal->flags &= ~SIGNAL_UNKILLABLE;
> 
> this needs spin_lock_irq(me->sighand->siglock).

Thanks for the review, will fix this for v2.

Jürg

[PATCH v2] prctl: add PR_[GS]ET_KILLABLE

2018-07-31 Thread Jürg Billeter

PR_SET_KILLABLE clears the SIGNAL_UNKILLABLE flag. This allows
CLONE_NEWPID tasks to restore normal signal behavior, opting out of the
special signal protection for init processes. This prctl does not allow
setting the SIGNAL_UNKILLABLE flag, only clearing.

The SIGNAL_UNKILLABLE flag, which is implicitly set for tasks cloned
with CLONE_NEWPID, has the effect of ignoring all signals (from
userspace) if the corresponding handler is set to SIG_DFL. The only
exceptions are SIGKILL and SIGSTOP and they are only accepted if raised
from an ancestor namespace.

SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z.
While a task with the SIGNAL_UNKILLABLE flag could install handlers for
these signals, this is not sufficient to implement a shell that uses
CLONE_NEWPID for child processes:

 * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process
   itself, it's not possible to implement the stop action in a custom
   SIGTSTP handler.
 * Many applications do not install handlers for these signals and
   thus, job control won't work properly with unmodified applications.

There are other scenarios besides job control in a shell where
applications rely on the default actions as described in signal(7) and
PID isolation may be useful. This new prctl makes the signal protection
for "init" processes optional, without breaking backward compatibility.

Signed-off-by: Jürg Billeter 
---
 v2: Hold siglock for PR_SET_KILLABLE, expand commit message.

 include/uapi/linux/prctl.h |  4 
 kernel/sys.c   | 13 +
 2 files changed, 17 insertions(+)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c0d7ea0bf5b6..92afb63da727 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -219,4 +219,8 @@ struct prctl_mm_map {
 # define PR_SPEC_DISABLE   (1UL << 2)
 # define PR_SPEC_FORCE_DISABLE (1UL << 3)
 
+/* Control SIGNAL_UNKILLABLE */
+#define PR_GET_KILLABLE54
+#define PR_SET_KILLABLE55
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 38509dc1f77b..92c9322cfb98 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2484,6 +2484,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
return -EINVAL;
error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
break;
+   case PR_GET_KILLABLE:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = put_user(!(me->signal->flags & SIGNAL_UNKILLABLE),
+(int __user *)arg2);
+   break;
+   case PR_SET_KILLABLE:
+   if (arg2 != 1 || arg3 || arg4 || arg5)
+   return -EINVAL;
+   spin_lock_irq(&me->sighand->siglock);
+   me->signal->flags &= ~SIGNAL_UNKILLABLE;
+   spin_unlock_irq(&me->sighand->siglock);
+   break;
default:
error = -EINVAL;
break;
-- 
2.18.0

Re: [PATCH v2] prctl: add PR_[GS]ET_KILLABLE

2018-07-31 Thread Jürg Billeter

On Tue, 2018-07-31 at 16:39 +0200, Oleg Nesterov wrote:
> On 07/31, Jürg Billeter wrote:
> > SIGINT, SIGQUIT and SIGTSTP are used in job control for ^C, ^\, ^Z.
> > While a task with the SIGNAL_UNKILLABLE flag could install handlers for
> > these signals, this is not sufficient to implement a shell that uses
> > CLONE_NEWPID for child processes:
> 
> Ah. My question wasn't clear, sorry.
> 
> Could you explain your use-case? Why a shell wants to use
> CLONE_NEWPID?

To guarantee that there won't be any runaway processes, i.e., ensure
that no descendants (background helper daemons or misbehaving
processes) survive when the child process is terminated. And to prevent
children from killing their ancestors. This is not something that can
be always-on in all shells, but it could be an option for users that
want this control/isolation.

> And what do we actually want in, say, ^Z case? Just stop the child reaper
> or may be it would be better to stop the whole pid namespace?

Stopping the whole PID namespace would be interesting, however, I think
this should be discussed separately if and when there is a proposal to
support this. For now the process group is stopped, same as without PID
namespaces.

> >  * As SIGSTOP is ignored when raised from the SIGNAL_UNKILLABLE process
> >itself, it's not possible to implement the stop action in a custom
> >SIGTSTP handler.
> 
> Yes. So may be we actually want to change __isig() paths to use
> SEND_SIG_FORCED (this is not that simple), or perhaps we can change
> __send_signal() to not drop SIGSTOP sent to itself, or may be we can even
> introduce SIG_DFL_EVEN_IF_INIT, I dunno.

In my opinion, my patch is much simpler and also more general as it
covers all scenarios where regular signal handling is required or
desired for "init" processes, with minimal code changes (after
PR_SET_KILLABLE, binaries that expect SIG_DFL to work can be executed
without changes).

> >  * Many applications do not install handlers for these signals and
> >thus, job control won't work properly with unmodified applications.
> 
> I can't understand this. An application should be changed anyway to do
> PR_SET_KILLABLE?

PR_SET_KILLABLE can be called (e.g., by the shell) between clone() and
execve(). (Some applications may have issues running as subreaper but
that's a separate matter, signal handling will work as expected).

> > +   case PR_SET_KILLABLE:
> > +   if (arg2 != 1 || arg3 || arg4 || arg5)
> > +   return -EINVAL;
> > +   spin_lock_irq(&me->sighand->siglock);
> > +   me->signal->flags &= ~SIGNAL_UNKILLABLE;
> > +   spin_unlock_irq(&me->sighand->siglock);
> 
> OK, but then you need to change the CLONE_PARENT/SIGNAL_UNKILLABLE check
> in copy_process().

Good point, need a different check for the PID namespace root process
in copy_process().

Thanks,
Jürg

Re: [PATCH] prctl: add PR_[GS]ET_KILLABLE

2018-08-01 Thread Jürg Billeter

On Tue, 2018-07-31 at 18:26 +0200, Jann Horn wrote:
> On Mon, Jul 30, 2018 at 10:01 AM Jürg Billeter  wrote:
> 
> [...]
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 38509dc1f77b..264de630d548 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> 
> [...]
> > +   case PR_SET_KILLABLE:
> > +   if (arg2 != 1 || arg3 || arg4 || arg5)
> > +   return -EINVAL;
> > +   me->signal->flags &= ~SIGNAL_UNKILLABLE;
> > +   break;
> 
> I don't have an opinion on this patchset otherwise, but should this
> prctl maybe block PR_SET_KILLABLE if you're actually the real init
> process? This seems like it could potentially lead to weird things.

While I don't expect global init to use this, I can't think of a good
reason to disallow it in the kernel. Do you have specific concerns or
is the code in kernel/fork.c the only reason? I prefer avoiding special
cases unless really required.

> This code in kernel/fork.c seems to rely on the fact that global init
> is SIGNAL_UNKILLABLE, and probably also leads to weirdness if
> container init is non-SIGNAL_UNKILLABLE:

Yes, Oleg has mentioned this as well. I have to change copy_process()
to directly check for the PID namespace root process instead of
checking for SIGNAL_UNKILLABLE.

Jürg

[PATCH v2] readv.2, io_submit.2: Document RWF_APPEND added in Linux 4.16

2018-04-06 Thread Jürg Billeter

Signed-off-by: Jürg Billeter 
---
Changes since version 1:
- Explain offset handling

 man2/io_submit.2 | 13 +
 man2/readv.2 | 17 +
 2 files changed, 30 insertions(+)

diff --git a/man2/io_submit.2 b/man2/io_submit.2
index 397fd0b75..25961138a 100644
--- a/man2/io_submit.2
+++ b/man2/io_submit.2
@@ -111,6 +111,19 @@ field of the
 .I io_event
 structure (see
 .BR io_getevents (2)).
+.TP
+.BR RWF_APPEND " (since Linux 4.16)"
+.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb
+Append data to the end of the file.
+See the description of the flag of the same name in
+.BR pwritev2 (2)
+as well as the description of
+.B O_APPEND
+in
+.BR open (2).
+The
+.I aio_offset
+field is ignored. The file offset is not changed.
 .RE
 .TP
 .I aio_lio_opcode
diff --git a/man2/readv.2 b/man2/readv.2
index b3b7b9658..9ef250e11 100644
--- a/man2/readv.2
+++ b/man2/readv.2
@@ -248,6 +248,23 @@ to
 .BR EAGAIN .
 Currently, this flag is meaningful only for
 .BR preadv2 ().
+.TP
+.BR RWF_APPEND " (since Linux 4.16)"
+.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb
+Provide a per-write equivalent of the
+.B O_APPEND
+.BR open (2)
+flag.
+This flag is meaningful only for
+.BR pwritev2 (),
+and its effect applies only to the data range written by the system call.
+The
+.I offset
+argument does not affect the write operation, the data is always appended
+to the end of the file. However, if the
+.I offset
+argument is \-1, the current file offset is updated.
+This matches the behavior when the file is opened in append mode.
 .SH RETURN VALUE
 On success,
 .BR readv (),
-- 
2.16.3

Re: [PATCH v4 2/4] namei: O_BENEATH-style path resolution flags

2018-11-23 Thread Jürg Billeter

Hi Aleksa,

On Tue, 2018-11-13 at 01:26 +1100, Aleksa Sarai wrote:
> * O_BENEATH: Disallow "escapes" from the starting point of the
>   filesystem tree during resolution (you must stay "beneath" the
>   starting point at all times). Currently this is done by disallowing
>   ".." and absolute paths (either in the given path or found during
>   symlink resolution) entirely, as well as all "magic link" jumping.

With open_tree(2) and OPEN_TREE_CLONE, will O_BENEATH still be
necessary? As I understand it, O_BENEATH could be replaced by a much
simpler flag that only disallows absolute paths (incl. absolute
symlinks). And it would have the benefit that you can actually pass the
tree/directory fd to another process and escaping would not be possible
even if that other process doesn't use O_BENEATH (after calling
mount_setattr(2) to make sure it's locked down).

This approach would also make it easy to restrict writes via a cloned
tree/directory fd by marking it read-only via mount_setattr(2) (and
locking down the read-only flag). This would again be especially useful
when passing tree/directory fds across processes, or for voluntary
self-lockdown within a process for robustness against security bugs.

This wouldn't affect any of the other flags in this patch. And for full
equivalence to O_BENEATH you'd have to use O_NOMAGICLINKS in addition
to O_NOABSOLUTE, or whatever that new flag would be called.

Or is OPEN_TREE_CLONE too expensive for this use case? Or is there
anything else I'm missing?

Jürg

[PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-27 Thread Jürg Billeter

This introduces a new thread group flag that can be set by calling

prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0)

When a thread group exits with this flag set, it will send SIGKILL to
all descendant processes.  This can be used to prevent stray child
processes.

This flag is cleared on privilege gaining execve(2) to ensure an
unprivileged process cannot get a privileged process to send SIGKILL.

Descendants that are orphaned and reparented to an ancestor of the
current process before the current process exits, will not be killed.
PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes.

If a descendant gained privileges, the current process may not be
allowed to kill it, and the descendant process will survive.
PR_SET_NO_NEW_PRIVS can be used to prevent descendant processes from
gaining privileges.

Suggested-by: Oleg Nesterov 
Signed-off-by: Jürg Billeter 
---
 fs/exec.c|  6 ++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/exit.c| 12 
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  3 +++
 7 files changed, 40 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 1ebf6e5a521d..f48ff493 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1339,6 +1339,12 @@ void setup_new_exec(struct linux_binprm * bprm)
/* Make sure parent cannot signal privileged process. */
current->pdeath_signal = 0;
 
+   /*
+* Do not send SIGKILL from privileged process as it may
+* have been requested by an unprivileged process.
+*/
+   current->signal->kill_descendants_on_exit = 0;
+
/*
 * For secureexec, reset the stack limit to sane default to
 * avoid bad behavior from the prior rlimits. This has to
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 1be35729c2c5..3bfb71701488 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -124,6 +124,9 @@ struct signal_struct {
unsigned intis_child_subreaper:1;
unsigned inthas_child_subreaper:1;
 
+   /* Send SIGKILL to descendant processes on exit */
+   unsigned intkill_descendants_on_exit:1;
+
 #ifdef CONFIG_POSIX_TIMERS
 
/* POSIX.1b Interval Timers */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c0d7ea0bf5b6..2ac4da1f282b 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -198,6 +198,10 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER  3
 # define PR_CAP_AMBIENT_CLEAR_ALL  4
 
+/* Send SIGKILL to descendant processes on exit */
+#define PR_SET_KILL_DESCENDANTS_ON_EXIT48
+#define PR_GET_KILL_DESCENDANTS_ON_EXIT49
+
 /* arm64 Scalable Vector Extension controls */
 /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */
 #define PR_SVE_SET_VL  50  /* set task vector length */
diff --git a/kernel/exit.c b/kernel/exit.c
index 0e21e6d21f35..7fe0c694685a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -694,6 +694,15 @@ static void forget_original_parent(struct task_struct 
*father,
list_splice_tail_init(&father->children, &reaper->children);
 }
 
+static int kill_descendant_visitor(struct task_struct *p, void *data)
+{
+   /* This may fail, e.g., when a descendant process gained privileges. */
+   group_send_sig_info(SIGKILL, SEND_SIG_NOINFO, p, PIDTYPE_TGID);
+
+   /* Always continue walking the process tree. */
+   return 1;
+}
+
 /*
  * Send signals to all our closest relatives so that they know
  * to properly mourn us..
@@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int 
group_dead)
struct task_struct *p, *n;
LIST_HEAD(dead);
 
+   if (group_dead && tsk->signal->kill_descendants_on_exit)
+   walk_process_tree(tsk, kill_descendant_visitor, NULL);
+
write_lock_irq(&tasklist_lock);
forget_original_parent(tsk, &dead);
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 123bd73046ec..8d9af81da093 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2476,6 +2476,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
return -EINVAL;
error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
break;
+   case PR_SET_KILL_DESCENDANTS_ON_EXIT:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   me->signal->kill_descendants_on_exit = !!arg2;
+   break;
+   case PR_GET_KILL_DESCENDANTS_ON_EXIT:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = put_user(me->signal->kill_descendants_on_

[PATCH 0/1] Add prctl to kill descendants on exit

2018-11-27 Thread Jürg Billeter

This patch adds a new prctl to kill all descendant processes on exit.
See commit message for details of the prctl.

This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1].
In the following discussion, Oleg suggested this approach.

The motivation for this is to provide a lightweight mechanism to prevent
stray processes. There is also a related Bugzilla entry [2].

PID namespaces can also be used to prevent stray processes, of course.
However, they are not quite as lightweight as they typically also
require a new mount namespace to be able to mount a new /proc.  And they
require CAP_SYS_ADMIN.  User namespaces can help to gain CAP_SYS_ADMIN,
however, that further increases the overhead and the other effects of
the user namespace may not be desired.

PID 1 in PID namespaces also exhibits non-standard signal behavior
(SIGNAL_UNKILLABLE) [3].

[1] https://lkml.kernel.org/lkml/20170929123058.48924-...@bitron.ch/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=43300
[3] https://lkml.kernel.org/lkml/20180803144021.56920-...@bitron.ch/

Jürg Billeter (1):
  prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

 fs/exec.c|  6 ++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/exit.c| 12 
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  3 +++
 7 files changed, 40 insertions(+)

-- 
2.19.2

Re: [PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-29 Thread Jürg Billeter

Hi Oleg,

Thanks for the review.

On Thu, 2018-11-29 at 13:34 +0100, Oleg Nesterov wrote:
> To me it would be more clean to call 
> walk_process_tree(kill_descendant_visitor)
> unconditionally in find_new_reaper() right before "if (has_child_subreaper)", 
> but
> then we will need to shift read_lock(tasklist) from walk_process_tree().

Yes, that's the reason why I added the call before the tasklist lock.
Let me know if you want me to move the read lock from
walk_process_tree() to PR_SET_CHILD_SUBREAPER (the only caller)
instead.

> So I think the patch is mostly fine, the only problem I can see is that
> PR_SET_KILL_DESCENDANTS_ON_EXIT can race with PR_SET_CHILD_SUBREAPER, they 
> both
> need to update the bits in the same word.

Good point. I'll make it a regular bool instead of a bitfield for v2,
unless you have another approach in mind to fix this.

Jürg

[PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-30 Thread Jürg Billeter

This introduces a new thread group flag that can be set by calling

prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0)

When a thread group exits with this flag set, it will send SIGKILL to
all descendant processes.  This can be used to prevent stray child
processes.

This flag is cleared on privilege gaining execve(2) to ensure an
unprivileged process cannot get a privileged process to send SIGKILL.

Descendants that are orphaned and reparented to an ancestor of the
current process before the current process exits, will not be killed.
PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes.

If a descendant gained privileges, the current process may not be
allowed to kill it, and the descendant process will survive.
PR_SET_NO_NEW_PRIVS can be used to prevent descendant processes from
gaining privileges.

Suggested-by: Oleg Nesterov 
Signed-off-by: Jürg Billeter 
---
 fs/exec.c|  6 ++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/exit.c| 12 
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  3 +++
 7 files changed, 40 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 044e296f2381..1c9520d83d6b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1343,6 +1343,12 @@ void setup_new_exec(struct linux_binprm * bprm)
/* Make sure parent cannot signal privileged process. */
current->pdeath_signal = 0;
 
+   /*
+* Do not send SIGKILL from privileged process as it may
+* have been requested by an unprivileged process.
+*/
+   current->signal->kill_descendants_on_exit = false;
+
/*
 * For secureexec, reset the stack limit to sane default to
 * avoid bad behavior from the prior rlimits. This has to
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 0c3e396dca04..91ed7f480b60 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -124,6 +124,9 @@ struct signal_struct {
unsigned intis_child_subreaper:1;
unsigned inthas_child_subreaper:1;
 
+   /* Send SIGKILL to descendant processes on exit */
+   boolkill_descendants_on_exit;
+
 #ifdef CONFIG_POSIX_TIMERS
 
/* POSIX.1b Interval Timers */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index b17201edfa09..a31141236064 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -198,6 +198,10 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER  3
 # define PR_CAP_AMBIENT_CLEAR_ALL  4
 
+/* Send SIGKILL to descendant processes on exit */
+#define PR_SET_KILL_DESCENDANTS_ON_EXIT48
+#define PR_GET_KILL_DESCENDANTS_ON_EXIT49
+
 /* arm64 Scalable Vector Extension controls */
 /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */
 #define PR_SVE_SET_VL  50  /* set task vector length */
diff --git a/kernel/exit.c b/kernel/exit.c
index 0e21e6d21f35..7fe0c694685a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -694,6 +694,15 @@ static void forget_original_parent(struct task_struct 
*father,
list_splice_tail_init(&father->children, &reaper->children);
 }
 
+static int kill_descendant_visitor(struct task_struct *p, void *data)
+{
+   /* This may fail, e.g., when a descendant process gained privileges. */
+   group_send_sig_info(SIGKILL, SEND_SIG_NOINFO, p, PIDTYPE_TGID);
+
+   /* Always continue walking the process tree. */
+   return 1;
+}
+
 /*
  * Send signals to all our closest relatives so that they know
  * to properly mourn us..
@@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int 
group_dead)
struct task_struct *p, *n;
LIST_HEAD(dead);
 
+   if (group_dead && tsk->signal->kill_descendants_on_exit)
+   walk_process_tree(tsk, kill_descendant_visitor, NULL);
+
write_lock_irq(&tasklist_lock);
forget_original_parent(tsk, &dead);
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 123bd73046ec..8d9af81da093 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2476,6 +2476,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
return -EINVAL;
error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
break;
+   case PR_SET_KILL_DESCENDANTS_ON_EXIT:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   me->signal->kill_descendants_on_exit = !!arg2;
+   break;
+   case PR_GET_KILL_DESCENDANTS_ON_EXIT:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = put_user(me->signal->kill_descendants_on_

[PATCH v2 0/1] Add prctl to kill descendants on exit

2018-11-30 Thread Jürg Billeter

This patch adds a new prctl to kill all descendant processes on exit.
See commit message for details of the prctl.

This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1].
In the following discussion, Oleg suggested this approach.

The motivation for this is to provide a lightweight mechanism to prevent
stray processes. There is also a related Bugzilla entry [2].

PID namespaces can also be used to prevent stray processes, of course.
However, they are not quite as lightweight as they typically also
require a new mount namespace to be able to mount a new /proc.  And they
require CAP_SYS_ADMIN.  User namespaces can help to gain CAP_SYS_ADMIN,
however, that further increases the overhead and the other effects of
the user namespace may not be desired.

PID 1 in PID namespaces also exhibits non-standard signal behavior
(SIGNAL_UNKILLABLE) [3].

Changes in v2:
  - Use bool instead of bitfield to avoid race with
PR_SET_CHILD_SUBREAPER

[1] https://lkml.kernel.org/lkml/20170929123058.48924-...@bitron.ch/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=43300
[3] https://lkml.kernel.org/lkml/20180803144021.56920-...@bitron.ch/

Jürg Billeter (1):
  prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

 fs/exec.c|  6 ++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/exit.c| 12 
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  3 +++
 7 files changed, 40 insertions(+)

-- 
2.19.2

Re: [PATCH v4] signal: add taskfd_send_signal() syscall

2018-12-06 Thread Jürg Billeter

On Thu, 2018-12-06 at 13:30 +0100, Florian Weimer wrote:
> * Christian Brauner:
> 
> > /* zombies */
> > Zombies can be signaled just as any other process. No special error will be
> > reported since a zombie state is an unreliable state (cf. [3]).
> 
> I still disagree with this analysis.  If I know that the target process
> is still alive, and it is not, this is a persistent error condition
> which can be reliably reported.  Given that someone might send SIGKILL
> to the process behind my back, detecting this error condition could be
> useful.

As I understand it, kill() behaves the same way. I think it's good that
this new syscall keeps the behavior as close as possible to kill().
E.g., this would allow emulating kill() (or a higher level API
equivalent) on top of taskfds without subtle differences in behavior.

As the new syscall supports flags, we could consider introducing a flag
that changes the behavior in the zombie case. However, I think that
should be a separate discussion (after merge of the syscall) and the
default behavior makes sense as is.

Jürg

Re: [PATCH v4] signal: add taskfd_send_signal() syscall

2018-12-06 Thread Jürg Billeter

On Thu, 2018-12-06 at 14:12 +0100, Florian Weimer wrote:
> * Jürg Billeter:
> 
> > On Thu, 2018-12-06 at 13:30 +0100, Florian Weimer wrote:
> > > * Christian Brauner:
> > > 
> > > > /* zombies */
> > > > Zombies can be signaled just as any other process. No special error 
> > > > will be
> > > > reported since a zombie state is an unreliable state (cf. [3]).
> > > 
> > > I still disagree with this analysis.  If I know that the target process
> > > is still alive, and it is not, this is a persistent error condition
> > > which can be reliably reported.  Given that someone might send SIGKILL
> > > to the process behind my back, detecting this error condition could be
> > > useful.
> > 
> > As I understand it, kill() behaves the same way. I think it's good that
> > this new syscall keeps the behavior as close as possible to kill().
> 
> No, kill does not behave in this way because the PID can be reused.
> The error condition is not stable there.

The PID can't be reused as long as it's a zombie. It can only be reused
when it has been wait()ed for. Or am I misunderstanding something?

Jürg

Re: [PATCH v2 0/1] Add prctl to kill descendants on exit

2018-12-06 Thread Jürg Billeter

On Fri, 2018-11-30 at 08:00 +, Jürg Billeter wrote:
> This patch adds a new prctl to kill all descendant processes on exit.
> See commit message for details of the prctl.
> 
> This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1].
> In the following discussion, Oleg suggested this approach.
> 
> The motivation for this is to provide a lightweight mechanism to prevent
> stray processes. There is also a related Bugzilla entry [2].

Andrew, Eric, does this look good to you as well?

Jürg

Re: [PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-12-01 Thread Jürg Billeter

On Fri, 2018-11-30 at 14:40 +0100, Florian Weimer wrote:
> * Jürg Billeter:
> 
> > This introduces a new thread group flag that can be set by calling
> > 
> > prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0)
> > 
> > When a thread group exits with this flag set, it will send SIGKILL to
> > all descendant processes.  This can be used to prevent stray child
> > processes.
> > 
> > This flag is cleared on privilege gaining execve(2) to ensure an
> > unprivileged process cannot get a privileged process to send SIGKILL.
> 
> So this is inherited across regular execve?  I'm not sure that's a good
> idea.

Yes, this matches PR_SET_CHILD_SUBREAPER (and other process
attributes). Besides consistency and allowing a parent to configure the
flag for a spawned process, this is also needed to prevent a process
from clearing the flag (in combination with a seccomp filter).

> 
> > Descendants that are orphaned and reparented to an ancestor of the
> > current process before the current process exits, will not be killed.
> > PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes.
> 
> For double- or triple-forking daemons, the reparenting will be racy, if
> I understand things correctly.

Can you please elaborate, if you're concerned about a particular race?
As the commit message mentions, for containment this flag can be
combined with PR_SET_CHILD_SUBREAPER (and PR_SET_NO_NEW_PRIVS).

Jürg

Re: [PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-12-01 Thread Jürg Billeter

On Sat, 2018-12-01 at 13:28 +0100, Florian Weimer wrote:
> * Jürg Billeter:
> 
> > On Fri, 2018-11-30 at 14:40 +0100, Florian Weimer wrote:
> > > * Jürg Billeter:
> > > 
> > > > This introduces a new thread group flag that can be set by calling
> > > > 
> > > > prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0)
> > > > 
> > > > When a thread group exits with this flag set, it will send SIGKILL to
> > > > all descendant processes.  This can be used to prevent stray child
> > > > processes.
> > > > 
> > > > This flag is cleared on privilege gaining execve(2) to ensure an
> > > > unprivileged process cannot get a privileged process to send SIGKILL.
> > > 
> > > So this is inherited across regular execve?  I'm not sure that's a good
> > > idea.
> > 
> > Yes, this matches PR_SET_CHILD_SUBREAPER (and other process
> > attributes). Besides consistency and allowing a parent to configure the
> > flag for a spawned process, this is also needed to prevent a process
> > from clearing the flag (in combination with a seccomp filter).
> 
> I think the semantics of PR_SET_CHILD_SUBREAPER are different, and the
> behavior makes more sense there.

In my opinion, introducing inconsistency by deviating from the common
behavior of retaining process attributes across execve would be more
confusing/surprising to users. I don't see why it makes sense for
PR_SET_CHILD_SUBREAPER but not for PR_SET_KILL_DESCENDANTS_ON_EXIT.

Also, the main motivation is to provide a subset of PID namespace
features to unprivileged processes with a lightweight mechanism.
Retaining kill_descendants_on_exit across execve allows very similar
usage to PID namespaces: E.g., the parent can set
PR_SET_KILL_DESCENDANTS_ON_EXIT and PR_SET_CHILD_SUBREAPER in the child
before execve and the spawned init-like executable doesn't need to know
about this flag itself, i.e., the same init-like program can function
as a leader of a PID namespace or as a subreaper with this extra flag
set without code changes.

If the flag was cleared by execve, the program would need to know about
this flag and it would be impossible for the parent to lock this down
using seccomp.

> 
> > > > Descendants that are orphaned and reparented to an ancestor of the
> > > > current process before the current process exits, will not be killed.
> > > > PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes.
> > > 
> > > For double- or triple-forking daemons, the reparenting will be racy, if
> > > I understand things correctly.
> > 
> > Can you please elaborate, if you're concerned about a particular race?
> > As the commit message mentions, for containment this flag can be
> > combined with PR_SET_CHILD_SUBREAPER (and PR_SET_NO_NEW_PRIVS).
> 
> Without PR_SET_CHILD_SUBREAPER, if a newly execve'ed daemon performs
> double/triple forking to disentangle itself from the parent process
> session, and the parent process which set
> PR_SET_KILL_DESCENDANTS_ON_EXIT terminates, behavior depends on when
> exactly the parent process terminates.  The daemon process will leak if
> it has completed its reparenting.
> 
> I think this could be sufficiently common that solution is needed here.

I expect the common case to be that PR_SET_KILL_DESCENDANTS_ON_EXIT
will be used together with PR_SET_CHILD_SUBREAPER (and possibly
PR_SET_NO_NEW_PRIVS) to prevent stray children. And I don't see a race
condition in that case.

PR_SET_KILL_DESCENDANTS_ON_EXIT can be used for non-subreapers but I
expect this to be used in more specialized scenarios where the program
is designed/known to avoid such race conditions. We could theoretically
restrict PR_SET_KILL_DESCENDANTS_ON_EXIT to subreapers but I currently
don't see a strong enough reason for this.

Jürg

Re: [PATCH] dmaengine: rcar-dmac: Handle hardware descriptor allocation failure

2014-12-08 Thread Jürg Billeter

Hi Vinod,

On Mon, 2014-12-08 at 17:06 +0530, Vinod Koul wrote:
> Can you please rebase this on my next, this fails to apply for me

This is a patch for Laurent's new rcar-dmac driver, which doesn't appear
to be in your next yet. Laurent has already merged my patch into his
dma/next branch and included it in his pull request from November 27.

[GIT PULL FOR v3.19] R-Car DMA engine driver
http://www.spinics.net/lists/linux-sh/msg37764.html

Please let me know if there is still a rebase required from my side.

Regards,
Jürg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] null_blk: fix blk_mq_init_queue() error handling

2015-01-08 Thread Jürg Billeter

blk_mq_init_queue() returns ERR_PTR() on failure, not NULL.

Signed-off-by: Jürg Billeter 
---
 drivers/block/null_blk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index ae9f615..727309c 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -530,8 +530,8 @@ static int null_add_dev(void)
goto out_cleanup_queues;
 
nullb->q = blk_mq_init_queue(&nullb->tag_set);
-   if (!nullb->q) {
-   rv = -ENOMEM;
+   if (IS_ERR(nullb->q)) {
+   rv = PTR_ERR(nullb->q);
goto out_cleanup_tags;
}
} else if (queue_mode == NULL_Q_BIO) {
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] virtio_blk: fix blk_mq_init_queue() error handling

2015-01-08 Thread Jürg Billeter

blk_mq_init_queue() returns ERR_PTR() on failure, not NULL.

Signed-off-by: Jürg Billeter 
---
 drivers/block/virtio_blk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 7ef7c09..b1b5c75 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -638,8 +638,8 @@ static int virtblk_probe(struct virtio_device *vdev)
goto out_put_disk;
 
q = vblk->disk->queue = blk_mq_init_queue(&vblk->tag_set);
-   if (!q) {
-   err = -ENOMEM;
+   if (IS_ERR(q)) {
+   err = PTR_ERR(q);
goto out_free_tags;
}
 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] ti-st: st_kim: fix nshutdown_gpio in get_platform_data

2015-06-24 Thread Jürg Billeter

Use of_get_named_gpio instead of of_property_read_u32.

Signed-off-by: Jürg Billeter 
---
 drivers/misc/ti-st/st_kim.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/ti-st/st_kim.c b/drivers/misc/ti-st/st_kim.c
index af71584..8df8faa 100644
--- a/drivers/misc/ti-st/st_kim.c
+++ b/drivers/misc/ti-st/st_kim.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_ST_DEVICES 3   /* Imagine 1 on each UART for now */
 static struct platform_device *st_kim_devices[MAX_ST_DEVICES];
@@ -749,7 +750,7 @@ static struct ti_st_plat_data *get_platform_data(struct 
device *dev)
 {
struct device_node *np = dev->of_node;
const u32 *dt_property;
-   int len;
+   int len, gpio;
 
dt_pdata = kzalloc(sizeof(*dt_pdata), GFP_KERNEL);
 
@@ -761,8 +762,14 @@ static struct ti_st_plat_data *get_platform_data(struct 
device *dev)
dt_property = of_get_property(np, "dev_name", &len);
if (dt_property)
memcpy(&dt_pdata->dev_name, dt_property, len);
-   of_property_read_u32(np, "nshutdown_gpio",
-&dt_pdata->nshutdown_gpio);
+
+   gpio = of_get_named_gpio(np, "nshutdown_gpio", 0);
+   if (gpio < 0) {
+   kfree(dt_pdata);
+   return ERR_PTR(gpio);
+   }
+   dt_pdata->nshutdown_gpio = gpio;
+
of_property_read_u32(np, "flow_cntrl", &dt_pdata->flow_cntrl);
of_property_read_u32(np, "baud_rate", &dt_pdata->baud_rate);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] ti-st: st_kim: use gpio_set_value_cansleep to fix warning

2015-06-24 Thread Jürg Billeter

GPIO accessor functions may sleep.

Signed-off-by: Jürg Billeter 
---
 drivers/misc/ti-st/st_kim.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/misc/ti-st/st_kim.c b/drivers/misc/ti-st/st_kim.c
index 8df8faa..0ab81d7 100644
--- a/drivers/misc/ti-st/st_kim.c
+++ b/drivers/misc/ti-st/st_kim.c
@@ -483,9 +483,9 @@ long st_kim_start(void *kim_data)
pdata->chip_enable(kim_gdata);
 
/* Configure BT nShutdown to HIGH state */
-   gpio_set_value(kim_gdata->nshutdown, GPIO_LOW);
+   gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_LOW);
mdelay(5);  /* FIXME: a proper toggle */
-   gpio_set_value(kim_gdata->nshutdown, GPIO_HIGH);
+   gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_HIGH);
mdelay(100);
/* re-initialize the completion */
reinit_completion(&kim_gdata->ldisc_installed);
@@ -567,11 +567,11 @@ long st_kim_stop(void *kim_data)
}
 
/* By default configure BT nShutdown to LOW state */
-   gpio_set_value(kim_gdata->nshutdown, GPIO_LOW);
+   gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_LOW);
mdelay(1);
-   gpio_set_value(kim_gdata->nshutdown, GPIO_HIGH);
+   gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_HIGH);
mdelay(1);
-   gpio_set_value(kim_gdata->nshutdown, GPIO_LOW);
+   gpio_set_value_cansleep(kim_gdata->nshutdown, GPIO_LOW);
 
/* platform specific disable */
if (pdata->chip_disable)
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] ti-st: st_kim: use ERR_PTR(-ENOMEM) instead of NULL

2015-06-24 Thread Jürg Billeter

This allows return of other error codes.

Signed-off-by: Jürg Billeter 
---
 drivers/misc/ti-st/st_kim.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/ti-st/st_kim.c b/drivers/misc/ti-st/st_kim.c
index 5027b8f..af71584 100644
--- a/drivers/misc/ti-st/st_kim.c
+++ b/drivers/misc/ti-st/st_kim.c
@@ -752,8 +752,11 @@ static struct ti_st_plat_data *get_platform_data(struct 
device *dev)
int len;
 
dt_pdata = kzalloc(sizeof(*dt_pdata), GFP_KERNEL);
-   if (!dt_pdata)
-   return NULL;
+
+   if (!dt_pdata) {
+   pr_err("Can't allocate device_tree platform data\n");
+   return ERR_PTR(-ENOMEM);
+   }
 
dt_property = of_get_property(np, "dev_name", &len);
if (dt_property)
@@ -773,10 +776,13 @@ static int kim_probe(struct platform_device *pdev)
struct ti_st_plat_data  *pdata;
int err;
 
-   if (pdev->dev.of_node)
+   if (pdev->dev.of_node) {
pdata = get_platform_data(&pdev->dev);
-   else
+   if (IS_ERR(pdata))
+   return PTR_ERR(pdata);
+   } else {
pdata = pdev->dev.platform_data;
+   }
 
if (pdata == NULL) {
dev_err(&pdev->dev, "Platform Data is missing\n");
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] ti-st: st_kim: fix nshutdown_gpio in get_platform_data

2015-06-24 Thread Jürg Billeter

On Wed, 2015-06-24 at 11:31 -0400, Peter Hurley wrote:
> On 06/24/2015 06:24 AM, Jürg Billeter wrote:
> > @@ -761,8 +762,14 @@ static struct ti_st_plat_data
> > *get_platform_data(struct device *dev)
> > dt_property = of_get_property(np, "dev_name", &len);
> > if (dt_property)
> > memcpy(&dt_pdata->dev_name, dt_property, len);
> > -   of_property_read_u32(np, "nshutdown_gpio",
> > -&dt_pdata->nshutdown_gpio);
> > +
> > +   gpio = of_get_named_gpio(np, "nshutdown_gpio", 0);
> 
> NAK. This breaks existing dtbs, since the format is not the same.

Isn't the existing code completely broken as there is no predictable
GPIO numbering in general? There is also no documentation or use of
that device tree property in the kernel tree, as far as I can tell.

Do you have a suggestion how to fix this without breaking existing
dtbs? Do we need to introduce a second property and support both in the
driver?

Regards,
Jürg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] readv.2, io_submit.2: Document RWF_APPEND added in Linux 4.16

2018-03-28 Thread Jürg Billeter

Signed-off-by: Jürg Billeter 
---
 man2/io_submit.2 | 10 ++
 man2/readv.2 | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/man2/io_submit.2 b/man2/io_submit.2
index 397fd0b75..79fcdfee4 100644
--- a/man2/io_submit.2
+++ b/man2/io_submit.2
@@ -111,6 +111,16 @@ field of the
 .I io_event
 structure (see
 .BR io_getevents (2)).
+.TP
+.BR RWF_APPEND " (since Linux 4.16)"
+.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb
+Append data to the end of the file.
+See the description of the flag of the same name in
+.BR pwritev2 (2)
+as well as the description of
+.B O_APPEND
+in
+.BR open (2).
 .RE
 .TP
 .I aio_lio_opcode
diff --git a/man2/readv.2 b/man2/readv.2
index b3b7b9658..668fba576 100644
--- a/man2/readv.2
+++ b/man2/readv.2
@@ -248,6 +248,16 @@ to
 .BR EAGAIN .
 Currently, this flag is meaningful only for
 .BR preadv2 ().
+.TP
+.BR RWF_APPEND " (since Linux 4.16)"
+.\" commit e1fc742e14e01d84d9693c4aca4ab23da65811fb
+Provide a per-write equivalent of the
+.B O_APPEND
+.BR open (2)
+flag.
+This flag is meaningful only for
+.BR pwritev2 (),
+and its effect applies only to the data range written by the system call.
 .SH RETURN VALUE
 On success,
 .BR readv (),
-- 
2.16.2

[PATCH] ARM: kernel: respect device tree status of cpu nodes

2014-02-24 Thread Jürg Billeter

Skip 'disabled' cpu nodes when building the cpu logical map. This avoids
booting cpus that have been disabled in the device tree.

Signed-off-by: Jürg Billeter 
Reviewed-by: Ben Dooks 
---
 arch/arm/kernel/devtree.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
index 739c3df..9aed299 100644
--- a/arch/arm/kernel/devtree.c
+++ b/arch/arm/kernel/devtree.c
@@ -95,6 +95,10 @@ void __init arm_dt_init_cpu_maps(void)
if (of_node_cmp(cpu->type, "cpu"))
continue;
 
+   /* Check if CPU is enabled */
+   if (!of_device_is_available(cpu))
+   continue;
+
pr_debug(" * %s...\n", cpu->full_name);
/*
 * A device tree containing CPU nodes with missing "reg"
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ASoC: rsnd: fixup index of SSI mod when SRC is used

2014-07-02 Thread Jürg Billeter

The default index 1 was used as the loop was terminated before the
following code could be reached:

if (mod[i] == this)
index = i;

Signed-off-by: Jürg Billeter 
---
 sound/soc/sh/rcar/core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/sound/soc/sh/rcar/core.c b/sound/soc/sh/rcar/core.c
index 4e86265..d737fea 100644
--- a/sound/soc/sh/rcar/core.c
+++ b/sound/soc/sh/rcar/core.c
@@ -295,9 +295,11 @@ static void rsnd_dma_of_name(struct rsnd_dma *dma,
mod[0] = NULL; /* for "mem" */
index = 1;
for (i = 1; i < MOD_MAX; i++) {
-   if (!src) {
-   mod[i] = ssi;
+   if (!ssi) {
break;
+   } else if (!src) {
+   mod[i] = ssi;
+   ssi = NULL;
} else if (!dvc) {
mod[i] = src;
src = NULL;
-- 
2.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [alsa-devel] [PATCH] ASoC: rsnd: fixup index of SSI mod when SRC is used

2014-07-03 Thread Jürg Billeter

Hi,

On Thu, 2014-07-03 at 22:57 +0900, Kuninori Morimoto wrote:
> If you add this code,
>
> if (!ssi) {
>break;
> } else ...
>  
> then, I guess we can remove this code ?
> 
>   if (mod[i] == ssi)
>   break;

I checked the issue on mainline master where the above two lines are not
present yet. Looking at Mark's for-next tree reveals that you already
fixed the issue in a slightly different way (commit c08c3b08). You may
disregard my patch.

Jürg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] spi: sh-sci: fix use-after-free in sh_sci_spi_remove()

2014-06-16 Thread Jürg Billeter

setbits() uses sp->membase.

Signed-off-by: Jürg Billeter 
---
 drivers/spi/spi-sh-sci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-sh-sci.c b/drivers/spi/spi-sh-sci.c
index 1f56ef6..b83dd73 100644
--- a/drivers/spi/spi-sh-sci.c
+++ b/drivers/spi/spi-sh-sci.c
@@ -175,9 +175,9 @@ static int sh_sci_spi_remove(struct platform_device *dev)
 {
struct sh_sci_spi *sp = platform_get_drvdata(dev);
 
-   iounmap(sp->membase);
-   setbits(sp, PIN_INIT, 0);
spi_bitbang_stop(&sp->bitbang);
+   setbits(sp, PIN_INIT, 0);
+   iounmap(sp->membase);
spi_master_put(sp->bitbang.master);
return 0;
 }
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] gpio: rcar: clamp returned value to [0,1]

2014-06-23 Thread Jürg Billeter

While it will be clamped to bool by gpiolib, let's make this sane
in the driver as well.

Signed-off-by: Jürg Billeter 
---
 drivers/gpio/gpio-rcar.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-rcar.c b/drivers/gpio/gpio-rcar.c
index 0c9f803..7c62121 100644
--- a/drivers/gpio/gpio-rcar.c
+++ b/drivers/gpio/gpio-rcar.c
@@ -240,9 +240,9 @@ static int gpio_rcar_get(struct gpio_chip *chip, unsigned 
offset)
/* testing on r8a7790 shows that INDT does not show correct pin state
 * when configured as output, so use OUTDT in case of output pins */
if (gpio_rcar_read(gpio_to_priv(chip), INOUTSEL) & bit)
-   return (int)(gpio_rcar_read(gpio_to_priv(chip), OUTDT) & bit);
+   return !!(gpio_rcar_read(gpio_to_priv(chip), OUTDT) & bit);
else
-   return (int)(gpio_rcar_read(gpio_to_priv(chip), INDT) & bit);
+   return !!(gpio_rcar_read(gpio_to_priv(chip), INDT) & bit);
 }
 
 static void gpio_rcar_set(struct gpio_chip *chip, unsigned offset, int value)
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] dmaengine: rcar-dmac: Handle hardware descriptor allocation failure

2014-11-25 Thread Jürg Billeter

If the atomic DMA coherent pool is too small, disable use of hardware
descriptor lists instead of crashing the system:

ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!

Unable to handle kernel NULL pointer dereference at virtual address 0004
Internal error: Oops: a07 [#1] PREEMPT SMP ARM

PC is at rcar_dmac_chan_reinit+0x3c/0x160
LR is at _raw_spin_lock_irqsave+0x18/0x5c

[<802132c0>] (rcar_dmac_chan_reinit) from [<80214818>] 
(rcar_dmac_isr_error+0x84/0xa0)
[<80214818>] (rcar_dmac_isr_error) from [<80060484>] 
(handle_irq_event_percpu+0x50/0x150)
[<80060484>] (handle_irq_event_percpu) from [<800605c0>] 
(handle_irq_event+0x3c/0x5c)
[<800605c0>] (handle_irq_event) from [<8006350c>] 
(handle_fasteoi_irq+0xb8/0x198)
[<8006350c>] (handle_fasteoi_irq) from [<8005fdb0>] 
(generic_handle_irq+0x20/0x30)
[<8005fdb0>] (generic_handle_irq) from [<8000fcd0>] (handle_IRQ+0x50/0xc4)
[<8000fcd0>] (handle_IRQ) from [<800092cc>] (gic_handle_irq+0x28/0x5c)
[<800092cc>] (gic_handle_irq) from [<80012700>] (__irq_svc+0x40/0x70)

Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Jürg Billeter 
---
 drivers/dma/sh/rcar-dmac.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c
index f8f70eb..f18b1b6 100644
--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -679,8 +679,8 @@ static void rcar_dmac_realloc_hwdesc(struct rcar_dmac_chan 
*chan,
desc->hwdescs.size = size;
 }
 
-static void rcar_dmac_fill_hwdesc(struct rcar_dmac_chan *chan,
- struct rcar_dmac_desc *desc)
+static int rcar_dmac_fill_hwdesc(struct rcar_dmac_chan *chan,
+struct rcar_dmac_desc *desc)
 {
struct rcar_dmac_xfer_chunk *chunk;
struct rcar_dmac_hw_desc *hwdesc;
@@ -689,7 +689,7 @@ static void rcar_dmac_fill_hwdesc(struct rcar_dmac_chan 
*chan,
 
hwdesc = desc->hwdescs.mem;
if (!hwdesc)
-   return;
+   return -ENOMEM;
 
list_for_each_entry(chunk, &desc->chunks, node) {
hwdesc->sar = chunk->src_addr;
@@ -697,6 +697,8 @@ static void rcar_dmac_fill_hwdesc(struct rcar_dmac_chan 
*chan,
hwdesc->tcr = chunk->size >> desc->xfer_shift;
hwdesc++;
}
+
+   return 0;
 }
 
 /* 
-
@@ -933,8 +935,10 @@ rcar_dmac_chan_prep_sg(struct rcar_dmac_chan *chan, struct 
scatterlist *sgl,
 * additional complexity remains to be investigated.
 */
desc->hwdescs.use = !highmem && nchunks > 1;
-   if (desc->hwdescs.use)
-   rcar_dmac_fill_hwdesc(chan, desc);
+   if (desc->hwdescs.use) {
+   if (rcar_dmac_fill_hwdesc(chan, desc) < 0)
+   desc->hwdescs.use = false;
+   }
 
return &desc->async_tx;
 }
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RESEND PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2019-01-18 Thread Jürg Billeter

This introduces a new thread group flag that can be set by calling

prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0)

When a thread group exits with this flag set, it will send SIGKILL to
all descendant processes.  This can be used to prevent stray child
processes.

This flag is cleared on privilege gaining execve(2) to ensure an
unprivileged process cannot get a privileged process to send SIGKILL.

Descendants that are orphaned and reparented to an ancestor of the
current process before the current process exits, will not be killed.
PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes.

If a descendant gained privileges, the current process may not be
allowed to kill it, and the descendant process will survive.
PR_SET_NO_NEW_PRIVS can be used to prevent descendant processes from
gaining privileges.

Suggested-by: Oleg Nesterov 
Signed-off-by: Jürg Billeter 
Reviewed-by: Oleg Nesterov 
---
 fs/exec.c|  6 ++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/exit.c| 12 
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  3 +++
 7 files changed, 40 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index fb72d36f7823..bbb5a0718223 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1342,6 +1342,12 @@ void setup_new_exec(struct linux_binprm * bprm)
/* Make sure parent cannot signal privileged process. */
current->pdeath_signal = 0;
 
+   /*
+* Do not send SIGKILL from privileged process as it may
+* have been requested by an unprivileged process.
+*/
+   current->signal->kill_descendants_on_exit = false;
+
/*
 * For secureexec, reset the stack limit to sane default to
 * avoid bad behavior from the prior rlimits. This has to
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 13789d10a50e..2acf481951f6 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -124,6 +124,9 @@ struct signal_struct {
unsigned intis_child_subreaper:1;
unsigned inthas_child_subreaper:1;
 
+   /* Send SIGKILL to descendant processes on exit */
+   boolkill_descendants_on_exit;
+
 #ifdef CONFIG_POSIX_TIMERS
 
/* POSIX.1b Interval Timers */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index b4875a93363a..d5483ca63c2d 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -198,6 +198,10 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER  3
 # define PR_CAP_AMBIENT_CLEAR_ALL  4
 
+/* Send SIGKILL to descendant processes on exit */
+#define PR_SET_KILL_DESCENDANTS_ON_EXIT48
+#define PR_GET_KILL_DESCENDANTS_ON_EXIT49
+
 /* arm64 Scalable Vector Extension controls */
 /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */
 #define PR_SVE_SET_VL  50  /* set task vector length */
diff --git a/kernel/exit.c b/kernel/exit.c
index 2d14979577ee..93a812c1b670 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -694,6 +694,15 @@ static void forget_original_parent(struct task_struct 
*father,
list_splice_tail_init(&father->children, &reaper->children);
 }
 
+static int kill_descendant_visitor(struct task_struct *p, void *data)
+{
+   /* This may fail, e.g., when a descendant process gained privileges. */
+   group_send_sig_info(SIGKILL, SEND_SIG_NOINFO, p, PIDTYPE_TGID);
+
+   /* Always continue walking the process tree. */
+   return 1;
+}
+
 /*
  * Send signals to all our closest relatives so that they know
  * to properly mourn us..
@@ -704,6 +713,9 @@ static void exit_notify(struct task_struct *tsk, int 
group_dead)
struct task_struct *p, *n;
LIST_HEAD(dead);
 
+   if (group_dead && tsk->signal->kill_descendants_on_exit)
+   walk_process_tree(tsk, kill_descendant_visitor, NULL);
+
write_lock_irq(&tasklist_lock);
forget_original_parent(tsk, &dead);
 
diff --git a/kernel/sys.c b/kernel/sys.c
index f7eb62eceb24..f6dba0ba9b77 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2485,6 +2485,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
return -EINVAL;
error = PAC_RESET_KEYS(me, arg2);
break;
+   case PR_SET_KILL_DESCENDANTS_ON_EXIT:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   me->signal->kill_descendants_on_exit = !!arg2;
+   break;
+   case PR_GET_KILL_DESCENDANTS_ON_EXIT:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = put_user(me->signal->kill_descenda

[RESEND PATCH v2 0/1] Add prctl to kill descendants on exit

2019-01-18 Thread Jürg Billeter

This patch adds a new prctl to kill all descendant processes on exit.
See commit message for details of the prctl.

This is a replacement of PR_SET_PDEATHSIG_PROC I proposed last year [1].
In the following discussion, Oleg suggested this approach.

The motivation for this is to provide a lightweight mechanism to prevent
stray processes. There is also a related Bugzilla entry [2].

PID namespaces can also be used to prevent stray processes, of course.
However, they are not quite as lightweight as they typically also
require a new mount namespace to be able to mount a new /proc.  And they
require CAP_SYS_ADMIN.  User namespaces can help to gain CAP_SYS_ADMIN,
however, that further increases the overhead and the other effects of
the user namespace may not be desired.

PID 1 in PID namespaces also exhibits non-standard signal behavior
(SIGNAL_UNKILLABLE) [3].

Changes in v2:
  - Use bool instead of bitfield to avoid race with
PR_SET_CHILD_SUBREAPER

[1] https://lkml.kernel.org/lkml/20170929123058.48924-...@bitron.ch/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=43300
[3] https://lkml.kernel.org/lkml/20180803144021.56920-...@bitron.ch/

Jürg Billeter (1):
  prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

 fs/exec.c|  6 ++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/exit.c| 12 
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  3 +++
 7 files changed, 40 insertions(+)

-- 
2.20.1

Re: [PATCH v2 2/5] pid: add pidfd_open()

2019-03-30 Thread Jürg Billeter

On Fri, 2019-03-29 at 16:54 +0100, Christian Brauner wrote:
> diff --git a/include/uapi/linux/wait.h b/include/uapi/linux/wait.h
> index ac49a220cf2a..d6c7c0701997 100644
> --- a/include/uapi/linux/wait.h
> +++ b/include/uapi/linux/wait.h
> @@ -18,5 +18,7 @@
>  #define P_PID1
>  #define P_PGID   2
>  
> +/* Get a file descriptor for /proc/ of the corresponding pidfd
> */
> +#define PIDFD_GET_PROCFD _IOR('p', 1, int)
>  
>  #endif /* _UAPI_LINUX_WAIT_H */

This is missing an entry in Documentation/ioctl/ioctl-number.txt and is
actually conflicting with existing entries.

However, I'd actually prefer a syscall to allow strict whitelisting via
seccomp and avoid the other ioctl disadvantages that Daniel has already
mentioned.

Cheers,
Jürg

[PATCH 2/2] iwlwifi: fix double hyphen in MODULE_FIRMWARE for 8000

2016-10-10 Thread Jürg Billeter

There is only a single hyphen between iwlwifi-8000C and the API version
in the firmware filename.

Fixes: e1ba684f762b ("iwlwifi: 8000: fix MODULE_FIRMWARE input")
Signed-off-by: Jürg Billeter 
---
 drivers/net/wireless/intel/iwlwifi/iwl-8000.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-8000.c 
b/drivers/net/wireless/intel/iwlwifi/iwl-8000.c
index d02ca14..8d3e53f 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-8000.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-8000.c
@@ -91,7 +91,7 @@
 
 #define IWL8000_FW_PRE "iwlwifi-8000C-"
 #define IWL8000_MODULE_FIRMWARE(api) \
-   IWL8000_FW_PRE "-" __stringify(api) ".ucode"
+   IWL8000_FW_PRE __stringify(api) ".ucode"
 
 #define IWL8265_FW_PRE "iwlwifi-8265-"
 #define IWL8265_MODULE_FIRMWARE(api) \
-- 
2.10.1

[PATCH 1/2] iwlwifi: fix MODULE_FIRMWARE for 6030

2016-10-10 Thread Jürg Billeter

IWL6000G2B_UCODE_API_MAX is not defined. ucode_api_max of
IWL_DEVICE_6030 uses IWL6000G2_UCODE_API_MAX. Use this also for
MODULE_FIRMWARE.

Fixes: 9d9b21d1b616 ("iwlwifi: remove IWL_*_UCODE_API_OK")
Signed-off-by: Jürg Billeter 
---
 drivers/net/wireless/intel/iwlwifi/iwl-6000.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-6000.c 
b/drivers/net/wireless/intel/iwlwifi/iwl-6000.c
index 0b9f6a7..39335b7 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-6000.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-6000.c
@@ -371,4 +371,4 @@ const struct iwl_cfg iwl6000_3agn_cfg = {
 MODULE_FIRMWARE(IWL6000_MODULE_FIRMWARE(IWL6000_UCODE_API_MAX));
 MODULE_FIRMWARE(IWL6050_MODULE_FIRMWARE(IWL6050_UCODE_API_MAX));
 MODULE_FIRMWARE(IWL6005_MODULE_FIRMWARE(IWL6000G2_UCODE_API_MAX));
-MODULE_FIRMWARE(IWL6030_MODULE_FIRMWARE(IWL6000G2B_UCODE_API_MAX));
+MODULE_FIRMWARE(IWL6030_MODULE_FIRMWARE(IWL6000G2_UCODE_API_MAX));
-- 
2.10.1

Re: [PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-09-12 Thread Jürg Billeter

Hi Oleg,

Thanks for the review.

On Tue, 2017-09-12 at 19:05 +0200, Oleg Nesterov wrote:
> On 09/09, Jürg Billeter wrote:
> > Unlike
> > PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole
> > subtree without race conditions.
> 
> but I am still not sure this is right... at least I can't understand the
> "without race conditions" above.
> 
> IOW, the child can do prctl(PR_SET_PDEATHSIG_PROC, SIGKILL) right after 
> fork(),
> why this is not enough to kill a whole subtree without race conditions?

What if the parent dies between fork() and prctl()? Besides avoiding
this race condition, it also makes it relatively easy to enforce
PDEATHSIG_PROC for all descendants of a process. You simply set
PDEATHSIG_PROC and then block further changes using seccomp (and set
no_new_privs) to avoid runaway children.

> OTOH. If you want to kill a whole sub-tree then perhaps the exiting process
> should simply send the ->pdeath_signal_proc to the whole sub-tree? Not that
> I really think this makes more sense, but if we add the new API we should
> discuss everything we can.

While this would likely work for my use case of avoiding runaway
processes, I don't think it would make sense for non-SIGKILL use cases
of cooperating processes. Inheritance across fork still allows
resetting PDEATHSIG_PROC in the child after fork and I don't expect the
parent death race to be a significant issue in the case of cooperating
processes.

> Say, CLONE_PARENT. Should it succeed if ->pdeath_signal_proc != 0 ?

Yes, I don't see an issue with that. The new process will be a sibling
and inheriting pdeath_signal_proc seems sensible to me for this.

Jürg

Re: [PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-09-13 Thread Jürg Billeter

On Wed, 2017-09-13 at 19:11 +0200, Oleg Nesterov wrote:
> On 09/12, Jürg Billeter wrote:
> > 
> > On Tue, 2017-09-12 at 19:05 +0200, Oleg Nesterov wrote:
> > > On 09/09, Jürg Billeter wrote:
> > > > Unlike
> > > > PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole
> > > > subtree without race conditions.
> > > 
> > > but I am still not sure this is right... at least I can't understand the
> > > "without race conditions" above.
> > > 
> > > IOW, the child can do prctl(PR_SET_PDEATHSIG_PROC, SIGKILL) right after 
> > > fork(),
> > > why this is not enough to kill a whole subtree without race conditions?
> > 
> > What if the parent dies between fork() and prctl()?
> 
> The child will be killed? Sorry, can't understand...

If PR_SET_PDEATHSIG_PROC was not inherited across fork and the parent
died between fork() and prctl(PR_SET_PDEATHSIG_PROC, SIGKILL) in the
child, the child would not be killed. It would be reparented to init(1)
or a subreaper, i.e., you end up with a runaway process. It would be
possible to safe guard against this race condition in other ways but
inheriting the setting avoids it nicely, and makes it easy to
apply/enforce PDEATHSIG_PROC for all descendants.

> > > Say, CLONE_PARENT. Should it succeed if ->pdeath_signal_proc != 0 ?
> > 
> > Yes, I don't see an issue with that. The new process will be a sibling
> > and inheriting pdeath_signal_proc seems sensible to me for this.
> 
> I meant, the process created by clone(CLONE_PARENT) won't be killed by
> pdeath_signal if the creator process exits, exactly because it won't be
> its child. Not that I think this is wrong.

Right, creator and parent won't be the same.

Jürg

[PATCH] fs: add RWF_APPEND

2017-09-09 Thread Jürg Billeter

This is the per-I/O equivalent of O_APPEND to support atomic append
operations on any open file.

On Linux, if a file is opened with O_APPEND, pwrite() ignores the offset
and always appends data to the end of the file. RWF_APPEND enables
atomic append and pwrite() with offset on a single file descriptor.

Signed-off-by: Jürg Billeter 
---
 include/linux/fs.h  | 2 ++
 include/uapi/linux/fs.h | 6 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index e723b62c4b81..5f27ef9cce17 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3187,6 +3187,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, 
rwf_t flags)
ki->ki_flags |= IOCB_DSYNC;
if (flags & RWF_SYNC)
ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
+   if (flags & RWF_APPEND)
+   ki->ki_flags |= IOCB_APPEND;
return 0;
 }
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index ea84938da405..5477480d0502 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -330,7 +330,11 @@ typedef int __bitwise __kernel_rwf_t;
 /* per-IO, return -EAGAIN if operation would block */
 #define RWF_NOWAIT ((__force __kernel_rwf_t)0x0008)
 
+/* per-IO O_APPEND */
+#define RWF_APPEND ((__force __kernel_rwf_t)0x0010)
+
 /* mask of flags supported by the kernel */
-#define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT)
+#define RWF_SUPPORTED \
+   (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT | RWF_APPEND)
 
 #endif /* _UAPI_LINUX_FS_H */
-- 
2.14.1

[PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-09-09 Thread Jürg Billeter

PR_SET_PDEATHSIG sets a parent death signal that the calling process
will get when its parent thread dies, even when the result of getppid()
doesn't change because the calling process is reparented to a different
thread in the same parent process. When managing multiple processes, a
process-based parent death signal is much more useful. E.g., to avoid
stray child processes.

PR_SET_PDEATHSIG_PROC sets a process-based death signal. Unlike
PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole
subtree without race conditions.

This can be used for sandboxing when combined with a seccomp filter.

There have been previous attempts to support this by changing the
behavior of PR_SET_PDEATHSIG. However, that would break existing
applications. See https://marc.info/?l=linux-kernel&m=117621804801689
and https://bugzilla.kernel.org/show_bug.cgi?id=43300

Signed-off-by: Jürg Billeter 
---
 fs/exec.c|  1 +
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/cred.c|  1 +
 kernel/exit.c|  4 
 kernel/fork.c|  2 ++
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  1 +
 9 files changed, 28 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 01a9fb9d8ac3..bb389c3c596d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1353,6 +1353,7 @@ void setup_new_exec(struct linux_binprm * bprm)
if (bprm->secureexec) {
/* Make sure parent cannot signal privileged process. */
current->pdeath_signal = 0;
+   current->signal->pdeath_signal_proc = 0;
 
/*
 * For secureexec, reset the stack limit to sane default to
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 2a0dd40b15db..c5c137e5ef39 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -103,6 +103,9 @@ struct signal_struct {
int group_stop_count;
unsigned intflags; /* see SIGNAL_* flags below */
 
+   /* The signal sent when the parent dies: */
+   int pdeath_signal_proc;
+
/*
 * PR_SET_CHILD_SUBREAPER marks a process, like a service
 * manager, to re-parent orphan (double-forking) child processes
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759a9e40..04508e81d4f2 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER  3
 # define PR_CAP_AMBIENT_CLEAR_ALL  4
 
+/* Process-based variant of PDEATHSIG */
+#define PR_SET_PDEATHSIG_PROC  48
+#define PR_GET_PDEATHSIG_PROC  49
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index ecf03657e71c..0192a94670e1 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -448,6 +448,7 @@ int commit_creds(struct cred *new)
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
task->pdeath_signal = 0;
+   task->signal->pdeath_signal_proc = 0;
smp_wmb();
}
 
diff --git a/kernel/exit.c b/kernel/exit.c
index a35d8a17e01f..1be0616239e0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -635,6 +635,10 @@ static void reparent_leader(struct task_struct *father, 
struct task_struct *p,
if (unlikely(p->exit_state == EXIT_DEAD))
return;
 
+   if (p->signal->pdeath_signal_proc)
+   group_send_sig_info(p->signal->pdeath_signal_proc,
+   SEND_SIG_NOINFO, p);
+
/* We don't want people slaying init. */
p->exit_signal = SIGCHLD;
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 24a4c0be80d5..f6482392ece9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1412,6 +1412,8 @@ static int copy_signal(unsigned long clone_flags, struct 
task_struct *tsk)
 
mutex_init(&sig->cred_guard_mutex);
 
+   sig->pdeath_signal_proc = current->signal->pdeath_signal_proc;
+
return 0;
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 2855ee73acd0..c47e92fa5370 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2210,6 +2210,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
case PR_GET_PDEATHSIG:
error = put_user(me->pdeath_signal, (int __user *)arg2);
break;
+   case PR_SET_PDEATHSIG_PROC:
+   if (!valid_signal(arg2)) {
+   error = -EINVAL;
+   break;
+   }
+   me->signal->pdeath_signal_proc = arg2;
+   break;
+   case PR_GET_PDEATHSIG_PROC:
+   error = put_user(me->signal->pdeath_signal_proc,
+(int

Re: new ...at() flag: AT_NO_JUMPS

2017-09-10 Thread Jürg Billeter

Hi Al,

Might it make sense to specify these lookup restrictions when opening
the directory (O_ROOT?) instead of specifying it for each lookup with
AT_* (or supporting both)? This might make it more useful when passing
directory fds between processes that do not use seccomp (where
AT_BENEATH could be enforced).

For my sandboxing use case, I'd be happy with either solution, though.
Is there anything I can do to help move this forward?

Best regards,
Jürg

[PATCH] Bluetooth: btintel: Add MODULE_FIRMWARE entries for iBT 3.5 controllers

2017-05-23 Thread Jürg Billeter

The iBT 3.5 controllers (Intel 8265, Windstorm Peak) need
intel/ibt-12-16.sfi and intel/ibt-12-16.ddc firmware files from
linux-firmware repository.

Signed-off-by: Jürg Billeter 
---
 drivers/bluetooth/btintel.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c
index fce154855718..d32e109bd5cb 100644
--- a/drivers/bluetooth/btintel.c
+++ b/drivers/bluetooth/btintel.c
@@ -575,3 +575,5 @@ MODULE_VERSION(VERSION);
 MODULE_LICENSE("GPL");
 MODULE_FIRMWARE("intel/ibt-11-5.sfi");
 MODULE_FIRMWARE("intel/ibt-11-5.ddc");
+MODULE_FIRMWARE("intel/ibt-12-16.sfi");
+MODULE_FIRMWARE("intel/ibt-12-16.ddc");
-- 
2.13.0

Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-10-02 Thread Jürg Billeter

On Mon, 2017-10-02 at 22:25 -0500, Eric W. Biederman wrote:
> The code where it calls group_send_sig_info is buggy for pdeath_signal.
> And it no less buggy for this new case.  There is no point to check
> permissions when sending a signal to yourself.  Especially this signal
> gets cleared during exec with a change of permissions.
> 
> 
> I would recommend using:
>  do_send_sig_info(p->signal->pdeath_signal_proc, SEND_SIG_NOINFO, p, true);
> 
> Perhaps with a comment saying that no permission check is needed when
> sending a signal to yourself.

Depending on how you look at it, one could also argue that the dying
parent sends the signal. However, I'm fine with dropping the permission
check in v2. I'll also send a patch to change this for the existing
pdeath_signal.

> I don't know what I think about inherit over fork, and the whole tree
> killing thing.  Except when the signal is SIGKILL I don't know if that
> code does what is intended.  So I am a little leary of it.

I agree that inheritance across fork is mainly useful for SIGKILL.
While non-SIGKILL users could clear the setting after fork(), another
option would be to allow the caller to specify whether the setting
should be inherited using prctl arg3.

This would allow both, the exact process-based equivalent to
pdeath_signal (no inheritance) as well as the interesting SIGKILL case
for killing a process tree. Does this sound sensible? I'd be happy to
add this to v2.

Jürg

Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-10-03 Thread Jürg Billeter

On Tue, 2017-10-03 at 09:46 -0500, Eric W. Biederman wrote:
> There is a general need to find out about the death of other processes,
> if you are not the parent of the process.   I would be inclined to call
> it waitfd.  Something that you give a pid.  It performs a permission
> check and the pid becomes readable when the process dies.  With poll
> working on the fd, and the fd returning wstatus of the dead child.
> 
> Support SIGIO on the fd and you have a signal delivery mechanism,
> if you want it.

File descriptors for processes (waitfd/clonefd) are definitely
interesting.  Especially if reaping the process (and reparenting its
children) is delayed until the last process file descriptor is closed. 
However, this would be a much larger addition and also less intuitive
to use if all you want is killing the process tree.

> For the kill all children when the parent dies the mechanism you are
> proposing is escapable.  We already have an inescapable version of it
> with init in a pid namespace.  We already have an escapable version of
> it with orphaned process groups and SIGHUP.
> 
> So I would really appreciate a very clear use case for what we are
> building here.  As it appears the killing of children can already be
> done another way, and that the waiting for the parent can be done better
> another way.

My use case is to provide a way for a process to spawn a child and
ensure that no descendants survive when that child dies.  Avoiding
runaway processes is desirable in many situations.  My motivation is
very lightweight (nested) sandboxing (every process is potentially
sandboxed).

I.e., pid namespaces would be a pretty good fit (assuming they are
sufficiently lightweight) but CLONE_NEWPID requires CAP_SYS_ADMIN. 
User namespaces can help here, but creating tons of user namespaces
just for this doesn't sound sensible. MAX_PID_NS_LEVEL could be an
issue as well at some point but 32 levels are likely fine in practice.

For my particular scenario I may actually be able to create a single
user namespace, run all processes with (namespaced) CAP_SYS_ADMIN and
use CLONE_NEWPID for every process.  However, I would prefer not
requiring CAP_SYS_ADMIN and a regular application that wants to avoid
runaway processes for a spawned helper process cannot rely on
CAP_SYS_ADMIN.

My plan was to use PR_SET_PDEATHSIG_PROC with PR_NO_NEW_PRIVS and a
suitable seccomp filter to prevent changes to pdeath_signal_proc.  For
my SIGKILL use case it would be even better to simply require
PR_NO_NEW_PRIVS and make pdeath_signal_proc sticky, avoiding the need
for seccomp.  I wanted to keep the differences to the existing
PR_SET_PDEATHSIG minimal but if we argue that the non-SIGKILL use case
is better solved with waitfd (or maybe the process events connector),
we could tailor the prctl for the SIGKILL use case (or support both via
prctl arg3).

I have another small patch locally that adds a prctl that restricts
kill(2) to direct children of the current thread group for lightweight
sandboxing.  That would also be redundant if it was possible to use
CLONE_NEWPID for every process.

What's actually the reason that CLONE_NEWPID requires CAP_SYS_ADMIN? 
Does CLONE_NEWPID pose any risks that don't exist for
CLONE_NEWUSER|CLONE_NEWPID?  Assuming we can't simply drop the
CAP_SYS_ADMIN requirement, do you see a better solution for this use
case?

Jürg

Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-10-03 Thread Jürg Billeter

On Tue, 2017-10-03 at 12:40 -0500, Eric W. Biederman wrote:
> Jürg Billeter  writes:
> > What's actually the reason that CLONE_NEWPID requires CAP_SYS_ADMIN? 
> > Does CLONE_NEWPID pose any risks that don't exist for
> > CLONE_NEWUSER|CLONE_NEWPID?  Assuming we can't simply drop the
> > CAP_SYS_ADMIN requirement, do you see a better solution for this use
> > case?
> 
> CLONE_NEWPID without a permission check would allow runing a setuid root
> application in a pid namespace.  Off the top of my head I can't think of
> a really good exploit.  But when you mess up pid files, and hide
> information from a privileged application I can completely imagine
> forcing that application to misbehave in ways the attacker can control.
> Leading to bad things.

Could we allow unprivileged CLONE_NEWPID if the no_new_privs bit is
set?

Jürg

[RESEND PATCH] fs: add RWF_APPEND

2017-09-29 Thread Jürg Billeter

This is the per-I/O equivalent of O_APPEND to support atomic append
operations on any open file.

If a file is opened with O_APPEND, pwrite() ignores the offset and
always appends data to the end of the file. RWF_APPEND enables atomic
append and pwrite() with offset on a single file descriptor.

Signed-off-by: Jürg Billeter 
---
 include/linux/fs.h  | 2 ++
 include/uapi/linux/fs.h | 6 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 339e73742e73..fee24eae7523 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3204,6 +3204,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, 
rwf_t flags)
ki->ki_flags |= IOCB_DSYNC;
if (flags & RWF_SYNC)
ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
+   if (flags & RWF_APPEND)
+   ki->ki_flags |= IOCB_APPEND;
return 0;
 }
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 56235dddea7d..ac145430bcd8 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -376,7 +376,11 @@ typedef int __bitwise __kernel_rwf_t;
 /* per-IO, return -EAGAIN if operation would block */
 #define RWF_NOWAIT ((__force __kernel_rwf_t)0x0008)
 
+/* per-IO O_APPEND */
+#define RWF_APPEND ((__force __kernel_rwf_t)0x0010)
+
 /* mask of flags supported by the kernel */
-#define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT)
+#define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
+RWF_APPEND)
 
 #endif /* _UAPI_LINUX_FS_H */
-- 
2.14.1

[RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-09-29 Thread Jürg Billeter

PR_SET_PDEATHSIG sets a parent death signal that the calling process
will get when its parent thread dies, even when the result of getppid()
doesn't change because the calling process is reparented to a different
thread in the same parent process. When managing multiple processes, a
process-based parent death signal is much more useful. E.g., to avoid
stray child processes.

PR_SET_PDEATHSIG_PROC sets a process-based death signal. Unlike
PR_SET_PDEATHSIG, this is inherited across fork to allow killing a whole
subtree without race conditions.

This can be used for sandboxing when combined with a seccomp filter.

There have been previous attempts to support this by changing the
behavior of PR_SET_PDEATHSIG. However, that would break existing
applications. See https://marc.info/?l=linux-kernel&m=117621804801689
and https://bugzilla.kernel.org/show_bug.cgi?id=43300

Signed-off-by: Jürg Billeter 
---

Previous discussion: https://patchwork.kernel.org/patch/9945315/

 fs/exec.c|  1 +
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/cred.c|  1 +
 kernel/exit.c|  4 
 kernel/fork.c|  2 ++
 kernel/sys.c | 11 +++
 security/apparmor/lsm.c  |  1 +
 security/selinux/hooks.c |  1 +
 9 files changed, 28 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index ac34d9724684..7045f0223140 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1334,6 +1334,7 @@ void setup_new_exec(struct linux_binprm * bprm)
if (bprm->secureexec) {
/* Make sure parent cannot signal privileged process. */
current->pdeath_signal = 0;
+   current->signal->pdeath_signal_proc = 0;
 
/*
 * For secureexec, reset the stack limit to sane default to
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 2a0dd40b15db..c5c137e5ef39 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -103,6 +103,9 @@ struct signal_struct {
int group_stop_count;
unsigned intflags; /* see SIGNAL_* flags below */
 
+   /* The signal sent when the parent dies: */
+   int pdeath_signal_proc;
+
/*
 * PR_SET_CHILD_SUBREAPER marks a process, like a service
 * manager, to re-parent orphan (double-forking) child processes
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759a9e40..04508e81d4f2 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER  3
 # define PR_CAP_AMBIENT_CLEAR_ALL  4
 
+/* Process-based variant of PDEATHSIG */
+#define PR_SET_PDEATHSIG_PROC  48
+#define PR_GET_PDEATHSIG_PROC  49
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index ecf03657e71c..0192a94670e1 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -448,6 +448,7 @@ int commit_creds(struct cred *new)
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
task->pdeath_signal = 0;
+   task->signal->pdeath_signal_proc = 0;
smp_wmb();
}
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 3481ababd06a..9b6fbb0128d7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -635,6 +635,10 @@ static void reparent_leader(struct task_struct *father, 
struct task_struct *p,
if (unlikely(p->exit_state == EXIT_DEAD))
return;
 
+   if (p->signal->pdeath_signal_proc)
+   group_send_sig_info(p->signal->pdeath_signal_proc,
+   SEND_SIG_NOINFO, p);
+
/* We don't want people slaying init. */
p->exit_signal = SIGCHLD;
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 10646182440f..264936c367e3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1415,6 +1415,8 @@ static int copy_signal(unsigned long clone_flags, struct 
task_struct *tsk)
 
mutex_init(&sig->cred_guard_mutex);
 
+   sig->pdeath_signal_proc = current->signal->pdeath_signal_proc;
+
return 0;
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 9aebc2935013..dcb9a535404e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2206,6 +2206,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
case PR_GET_PDEATHSIG:
error = put_user(me->pdeath_signal, (int __user *)arg2);
break;
+   case PR_SET_PDEATHSIG_PROC:
+   if (!valid_signal(arg2)) {
+   error = -EINVAL;
+   break;
+   }
+   me->signal->pdeath_signal_proc = arg2;
+   break;
+   case PR_GET_PDEATHSIG_PROC:
+   error = put_user

Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

2017-10-08 Thread Jürg Billeter

On Thu, 2017-10-05 at 18:27 +0200, Oleg Nesterov wrote:
> On 10/03, Jürg Billeter wrote:
> > 
> > My use case is to provide a way for a process to spawn a child and
> > ensure that no descendants survive when that child dies.  Avoiding
> > runaway processes is desirable in many situations.  My motivation is
> > very lightweight (nested) sandboxing (every process is potentially
> > sandboxed).
> > 
> > I.e., pid namespaces would be a pretty good fit (assuming they are
> > sufficiently lightweight) but CLONE_NEWPID
> 
> sorry if this was already discussed, I didn't read this thread yet...
> 
> if CLONE_NEWPID is not suitable for any reason. We already have
> PR_SET_CHILD_SUBREAPER. Perhaps we can simply add another
> PR_SET_KILL_ALL_DESCEDANTS_ON_EXIT? we can use walk_process_tree()
> to send SIGKILL.

Yes, this is an option.  However, after the discussion in this thread I
believe it would be better to drop the CAP_SYS_ADMIN requirement for
CLONE_NEWPID (when no_new_privs is set) as this would avoid adding
another API and code path for a similar effect.  I'm interested in
possible security concerns about such a change.  Adding Andy Lutomirski
to cc.

Jürg

52 matches

Mail list logo