----- Original Message -----
From: "Andriy Gapon" <a...@freebsd.org>
on 20/08/2011 23:24 Steven Hartland said the following:
----- Original Message ----- From: "Steven Hartland"
Looking through the code I believe I may have noticed a scenario which could
trigger the problem.
Given the following code:-
static void
prison_deref(struct prison *pr, int flags)
{
struct prison *ppr, *tpr;
int vfslocked;
if (!(flags & PD_LOCKED))
mtx_lock(&pr->pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags & PD_DEUREF) {
for (tpr = pr;; tpr = tpr->pr_parent) {
if (tpr != pr)
mtx_lock(&tpr->pr_mtx);
if (--tpr->pr_uref > 0)
break;
KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
mtx_unlock(&tpr->pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {
mtx_unlock(&tpr->pr_mtx);
if (flags & PD_LIST_SLOCKED)
sx_sunlock(&allprison_lock);
else if (flags & PD_LIST_XLOCKED)
sx_xunlock(&allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(&tpr->pr_mtx);
mtx_lock(&pr->pr_mtx);
}
}
If you take a scenario of a simple one level prison setup running a single
process
where a prison has just been stopped.
In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.
Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.
If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.
In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by
prison1)
It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.
I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is "really" being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.
What do people think?
After reviewing the changes to prison_deref in commit which added hierarchical
jails, the removal of the lock by the inital loop on the passed in prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h
If so the following may be all that's needed to fix this issue:-
diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
if (--tpr->pr_uref > 0)
break;
KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
- mtx_unlock(&tpr->pr_mtx);
+ if (tpr != pr)
+ mtx_unlock(&tpr->pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {
Not sure if this would fly as is - please double check the later block where
pr->pr_mtx is re-locked.
Your right, and its actually more complex than that. Although changing it to
not unlock in the middle of prison_deref fixes that race condition it doesn't
prevent pr_uref being incorrectly decremented each time the jail gets into
the dying state, which is really the problem we are seeing.
If hierarchical prisons are used there seems to be an additional problem
where the counter of all prisons in the hierarchy are decremented, but as
far as I can tell only the immediate parent is ever incremented, so another
reference problem there as well I think.
The following patch I believe fixes both of these issues.
I've testing with debug added and confirmed prison0's pr_uref is maintained
correctly even when a jail hits dying state multiple times.
It essentially reverts the changes to the "if (flags & PD_DEUREF)" by
192895 and moves it to after the jail has been actually removed.
diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100
@@ -2449,27 +2449,16 @@
mtx_lock(&pr->pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags & PD_DEUREF) {
- for (tpr = pr;; tpr = tpr->pr_parent) {
- if (tpr != pr)
- mtx_lock(&tpr->pr_mtx);
- if (--tpr->pr_uref > 0)
- break;
- KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
- mtx_unlock(&tpr->pr_mtx);
- }
+ pr->pr_uref--;
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {
- mtx_unlock(&tpr->pr_mtx);
+ mtx_unlock(&pr->pr_mtx);
if (flags & PD_LIST_SLOCKED)
sx_sunlock(&allprison_lock);
else if (flags & PD_LIST_XLOCKED)
sx_xunlock(&allprison_lock);
return;
}
- if (tpr != pr) {
- mtx_unlock(&tpr->pr_mtx);
- mtx_lock(&pr->pr_mtx);
- }
}
for (;;) {
@@ -2525,6 +2514,8 @@
/* Removing a prison frees a reference on its parent. */
pr = ppr;
mtx_lock(&pr->pr_mtx);
+ /* Ensure user reference added on create is removed */
+ pr->pr_uref--;
flags = PD_DEREF;
}
}
Jamie from what I can tell you where the original committer of hierarchical
prisons and this in the following svn change set so would really appreciate
your feedback on this.
http://svnweb.freebsd.org/base?view=revision&revision=192895
Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.
_______________________________________________
freebsd-jail@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "freebsd-jail-unsubscr...@freebsd.org"