On Mon, Jun 23, 2014 at 03:16:53PM +0200, Mateusz Guzik wrote: > On Mon, Jun 23, 2014 at 11:18:23AM +0300, Konstantin Belousov wrote: > > On Mon, Jun 23, 2014 at 10:05:01AM +0200, Mateusz Guzik wrote: > > > On Mon, Jun 23, 2014 at 10:25:19AM +0300, Konstantin Belousov wrote: > > > > On Mon, Jun 23, 2014 at 09:06:52AM +0200, Mateusz Guzik wrote: > > > > > The table is modified in these functions and is reachable from the > > > > > rest > > > > > of the kernel (can be found by e.g. sysctl_kern_proc_filedesc), thus > > > > > XLOCK is needed to ensure consistency for readers. It can also be > > > > > altered by mountcheckdirs, although not in a way which disrupts any of > > > > > these functions. > > > > I would think that such cases should be avoided by testing for P_INEXEC, > > > > but I do not insist on this. > > > > > > > > > > proc lock has to be dropped before filedesc lock is taken and I don't > > > see any way to prevent the proc from transitioning to P_INEXEC afterwards. > > > Since sysctl_kern_proc_filedesc et al can take a long time it does not > > > seem feasible to pursue this. > > > > > After a second look this problem has to be dealt with. > > If traversal while transition to P_INEXEC is allowed, execve dealing > with a setuid binary is problematic. This is more of hypothetical nature, > but with sufficienly long delay it could finish the syscall and start > opening some files, which paths would now be visible for an unprivileged > reader. > > That said, I propose adding a counter to struct proc which would which > would block execve. It would be quite similar to p_lock. I thought about this too. In fact, I considered using PHOLD for this.
> > iow execve would: > > PROC_LOCK(p); > p->p_flag |= P_INEXEC; > while (p->p_execlock > 0) > msleep(&p->p_execlock, &p->p_mtx, PWAIT, "execlock", 0); > PROC_UNLOCK(p); > > And it would be mandatory for external fdp consumers to grab the counter. > > I'm tempted to add P_GETPIN which would both increase p_lock and p_execlock, > that way the process is guaranteed not to exit and not to execve even > after proc lock is dropped. See above about PHOLD. > > There is a separate question if p_execlock should be renamed and > extended to also block any kind of credential changes. > > Then the guarantee is even stronger since we know that credentials we > checked against are not going to change for the duration of our > operations, but it is unclear if we need this. If doing separate execlock/p_lock, I think that it could be possible to use per-process sx lock instead of hand-rolling the counter. The accessors would lock sx shared, while kern_execve would take it in exclusive mode.
pgpCuY9nMjzhd.pgp
Description: PGP signature