On 08/20/11 19:19, Steven Hartland wrote:
----- Original Message ----- From: "Andriy Gapon" <a...@freebsd.org>

on 20/08/2011 23:24 Steven Hartland said the following:
----- Original Message ----- From: "Steven Hartland"
Looking through the code I believe I may have noticed a scenario
which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
struct prison *ppr, *tpr;
int vfslocked;

if (!(flags & PD_LOCKED))
mtx_lock(&pr->pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags & PD_DEUREF) {
for (tpr = pr;; tpr = tpr->pr_parent) {
if (tpr != pr)
mtx_lock(&tpr->pr_mtx);
if (--tpr->pr_uref > 0)
break;
KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
mtx_unlock(&tpr->pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {
mtx_unlock(&tpr->pr_mtx);
if (flags & PD_LIST_SLOCKED)
sx_sunlock(&allprison_lock);
else if (flags & PD_LIST_XLOCKED)
sx_xunlock(&allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(&tpr->pr_mtx);
mtx_lock(&pr->pr_mtx);
}
}

If you take a scenario of a simple one level prison setup running a
single
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As
this is the
last process then pr_uref will hit 0 and the loop continues instead
of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other
process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but
then instantly
exit, this process may allow another kernel thread to hit this same
bit of code
and so two process for the same prison get into the section which
decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock <-- this now allows others to change
prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock <-- this now allows others to change
prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented
twice by prison1)

It seems like the action on the parent prison to decrement the
pr_uref is
happening too early, while the jail can still be used and without
the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison
pr_uref's down
so it only takes place if the jail is "really" being removed. Either
that or
to change the locking semantics so that once the lock is aquired in
this
prison_deref its not unlocked until the function completes.

What do people think?

After reviewing the changes to prison_deref in commit which added
hierarchical
jails, the removal of the lock by the inital loop on the passed in
prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h



If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
if (--tpr->pr_uref > 0)
break;
KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
- mtx_unlock(&tpr->pr_mtx);
+ if (tpr != pr)
+ mtx_unlock(&tpr->pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {

Not sure if this would fly as is - please double check the later block
where
pr->pr_mtx is re-locked.

Your right, and its actually more complex than that. Although changing
it to
not unlock in the middle of prison_deref fixes that race condition it
doesn't
prevent pr_uref being incorrectly decremented each time the jail gets into
the dying state, which is really the problem we are seeing.

If hierarchical prisons are used there seems to be an additional problem
where the counter of all prisons in the hierarchy are decremented, but as
far as I can tell only the immediate parent is ever incremented, so another
reference problem there as well I think.

The following patch I believe fixes both of these issues.

I've testing with debug added and confirmed prison0's pr_uref is maintained
correctly even when a jail hits dying state multiple times.

It essentially reverts the changes to the "if (flags & PD_DEUREF)" by
192895 and moves it to after the jail has been actually removed.

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100
@@ -2449,27 +2449,16 @@
mtx_lock(&pr->pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags & PD_DEUREF) {
- for (tpr = pr;; tpr = tpr->pr_parent) {
- if (tpr != pr)
- mtx_lock(&tpr->pr_mtx);
- if (--tpr->pr_uref > 0)
- break;
- KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
- mtx_unlock(&tpr->pr_mtx);
- }
+ pr->pr_uref--;
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {
- mtx_unlock(&tpr->pr_mtx);
+ mtx_unlock(&pr->pr_mtx);
if (flags & PD_LIST_SLOCKED)
sx_sunlock(&allprison_lock);
else if (flags & PD_LIST_XLOCKED)
sx_xunlock(&allprison_lock);
return;
}
- if (tpr != pr) {
- mtx_unlock(&tpr->pr_mtx);
- mtx_lock(&pr->pr_mtx);
- }
}

for (;;) {
@@ -2525,6 +2514,8 @@
/* Removing a prison frees a reference on its parent. */
pr = ppr;
mtx_lock(&pr->pr_mtx);
+ /* Ensure user reference added on create is removed */
+ pr->pr_uref--;
flags = PD_DEREF;
}
}

Jamie from what I can tell you where the original committer of hierarchical
prisons and this in the following svn change set so would really appreciate
your feedback on this.
http://svnweb.freebsd.org/base?view=revision&revision=192895

The problem isn't with the conditional locking of tpr in prison_deref.
That locking is actually correct, and there's no race condition. The
trouble lies in the resurrection of dead jails, as Andriy has noted
(though not just attaching, but also by setting its persist flag causes
the same problem).

There are two possible fixes to this. One is the patch you've given,
which only decrements a parent jail's pr_uref when the child jail
completely goes away (as opposed to when it loses its last uref). This
provides symmetry with the current way pr_uref is incremented on the
parent, which is only when a jail is created.

The other fix is to increment a parent's pr_uref when a jail is
resurrected, which will match the current logic in prison_deref. I like
the external semantics of this solution: a jail isn't visible if it is
not persistent and has no processes and no *visible* sub-jails, as
opposed to having no sub-jails at all. But this solution ends up pretty
complicated - there are a few places where pr_uref is incremented, where
I might need to increment parent jails' pr_uref as well, much like the
current tpr loop in prison_deref decrements them.

Your solution removes code instead of adding it, which is generally a
good thing. While it does change the semantics of pr_uref in the
hierarchical case at least from what I thought it was, those semantics
haven't been working properly anyway.

Bjoern, I'm adding you to the CC list for this because the whole pr_uref
thing was your idea (though it was pr_nprocs at the time), so you might
care about the hierarchical semantics of it - or you may not. Also, this
is a panic-inducing bug in current and may interest you for that reason.

- Jamie
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to