On Thu, 9 May 2013, Marcel Moolenaar wrote:
Author: marcel
Date: Thu May 9 16:28:18 2013
New Revision: 250411
URL: http://svnweb.freebsd.org/changeset/base/250411
Log:
Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE
locks. To support this, VNODE locks are created with the LK_IS_VNODE
flag. This flag is propagated down using the LO_IS_VNODE flag.
Note that WITNESS still records the LOR. Only the printing and the
optional entering into the kernel debugger is bypassed with the
WITNESS_NO_VNODE option.
I'm replying to the original commit because the resulting thread got way
out of hand. We need to all take a deep breath and take a pragmatic
approach to solving the problem at hand.
Let me first say I understand the utility here as this is also coming up
in my organization. Test, and users, do not want to see erroneous warning
messages. I understand that. Let's find a solution.
Secondly, I think this project has grown too far for us to commit changes
like this without some focused discussion. We need to be more mindful of
the size of the impact and the number of people who are interested in a
particular area. I'm not picking on you Marcel because this sort of thing
has been coming up lately and we have all been guilty of it from time to
time. There are more companies and individuals than ever trying to push
work into the repository and we're having some growing pains.
I am intimately familiar with the problems that lead to these erroneous
witness messages as I have tracked down many of them and am even
responsible for the code that generates them in some cases. Let me first
outline a handful of generic problems. The root cause is that witness can
not determine the real order between two locks due to relationships too
complex to describe with a pair of strings.
One example, which has been brought up, is the hierarchical nature of
vnode locks. This impacts vnodes within one filesystem but it also
involves vnodes between two different filesystems as you cross mount
points. We can construct perfectly valid and deadlock free chains of
mount points that have two different filesystem types in different orders
which will LOR at the boundaries. We already skip duplicates to avoid
this problem within each filesystem. We need to skip cross-filesystem
duplicates, most desirably at the few specific places where this happens.
This problem comes up especially for devfs because we lock devvps while
file vnodes are locked but we lock devfs directories after the rootfs lock
when crossing mountpoints in lookup.
A second example, is locks of a fundamentally different type that have a
complex ordering relationship. For example, a vnode lock may be acquired
after a buf lock belonging to the parent's directory block. A cg buf lock
may be acquired after any file buf lock. Here we want to ignore
interactions between these two specific types at this particular location
but not others as they may be unsafe.
The third example, is a complex locking pattern with shared locks as
presented by dirhash. We are seeing a similar pattern develop in the vm
where we are going to use an exclusive object lock to protect pages or a
shared object lock + a page lock. The semantics only get more complex as
we push for more scalability. I expect to see more of these patterns
develop.
None of these problems can be solved with names alone. So far we've
just lived with the warnings and we're no longer willing to accept that.
What we need is a solution that blesses the specific instances and the
specific lock classes involved without silencing legitimate warnings that
may only occur after new code is added. For example, it may be safe to
add a sx lock around some vnode code but you may not notice that you LOR
if you silence all witness warnings related to the vnode lock site.
I believe that the perfect solution would be a mechanism that could teach
witness about and enforce these specific relationships. However, that may
be computationally prohibitive and too complex to code. A more reasonable
option would be to bless the specific relationships at the specific call
sites. Turning all witness off at particular sites or with particular
types renders important infrastructure useless for very large functional
areas. It's also important to distinguish between squelching the error
message from eliminating the other state that is saved at lock sites.
We already have lock names and types. What I would propose we do is make
the type 'vnode' for all vnodes and 'buf' for all bufs with the names used
for the specific filesystems. Then you could specify a DUPOK that
automatically blesses any filesystem to filesystem related LORs. In this
way witness still records the call sites and unrelated LORs or panics
still have the acquisition information. You could eventually unwind this
to only DUPOK at the specific currently known places that we anticipate
multiple vnodes.
To solve the buf and other complex LORs involving multiple types you would
make lock variants that accept a blessed list of the specific locks you
are blessing. We already have support for a blessed list in witness. We
just want something like this per-call.
For example; When acquiring a child vnode lock after a parent lock when
a parent's buf is held I would do something like this:
vn_lock_flags(vp, LK_EXCLUSIVE, { "vnode", "bufwait", NULL });
Written properly dead code elimination will take care of it for non-debug
builds. You could come up with other ways of writing this that don't
involve passing pointers down the stack. For example, making a unique
token for this blessed pair and passing it in the high bits of the flags.
I don't really care about the particular implementation but I think this
is the right model.
Thanks,
Jeff
Modified:
head/sys/conf/options
head/sys/kern/kern_lock.c
head/sys/kern/subr_witness.c
head/sys/kern/vfs_subr.c
head/sys/sys/lock.h
head/sys/sys/lockmgr.h
Modified: head/sys/conf/options
==============================================================================
--- head/sys/conf/options Thu May 9 16:09:39 2013 (r250410)
+++ head/sys/conf/options Thu May 9 16:28:18 2013 (r250411)
@@ -672,6 +672,7 @@ KTR_ENTRIES opt_global.h
KTR_VERBOSE opt_ktr.h
WITNESS opt_global.h
WITNESS_KDB opt_witness.h
+WITNESS_NO_VNODE opt_witness.h
WITNESS_SKIPSPIN opt_witness.h
# options for ACPI support
Modified: head/sys/kern/kern_lock.c
==============================================================================
--- head/sys/kern/kern_lock.c Thu May 9 16:09:39 2013 (r250410)
+++ head/sys/kern/kern_lock.c Thu May 9 16:28:18 2013 (r250411)
@@ -393,6 +393,8 @@ lockinit(struct lock *lk, int pri, const
iflags |= LO_WITNESS;
if (flags & LK_QUIET)
iflags |= LO_QUIET;
+ if (flags & LK_IS_VNODE)
+ iflags |= LO_IS_VNODE;
iflags |= flags & (LK_ADAPTIVE | LK_NOSHARE);
lk->lk_lock = LK_UNLOCKED;
Modified: head/sys/kern/subr_witness.c
==============================================================================
--- head/sys/kern/subr_witness.c Thu May 9 16:09:39 2013
(r250410)
+++ head/sys/kern/subr_witness.c Thu May 9 16:28:18 2013
(r250411)
@@ -1289,7 +1289,19 @@ witness_checkorder(struct lock_object *l
w->w_reversed = w1->w_reversed = 1;
witness_increment_graph_generation();
mtx_unlock_spin(&w_mtx);
-
+
+#ifdef WITNESS_NO_VNODE
+ /*
+ * There are known LORs between VNODE locks. They are
+ * not an indication of a bug. VNODE locks are flagged
+ * as such (LO_IS_VNODE) and we don't yell if the LOR
+ * is between 2 VNODE locks.
+ */
+ if ((lock->lo_flags & LO_IS_VNODE) != 0 &&
+ (lock1->li_lock->lo_flags & LO_IS_VNODE) != 0)
+ return;
+#endif
+
/*
* Ok, yell about it.
*/
Modified: head/sys/kern/vfs_subr.c
==============================================================================
--- head/sys/kern/vfs_subr.c Thu May 9 16:09:39 2013 (r250410)
+++ head/sys/kern/vfs_subr.c Thu May 9 16:28:18 2013 (r250411)
@@ -1037,7 +1037,7 @@ alloc:
* By default, don't allow shared locks unless filesystems
* opt-in.
*/
- lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE);
+ lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE | LK_IS_VNODE);
/*
* Initialize bufobj.
*/
Modified: head/sys/sys/lock.h
==============================================================================
--- head/sys/sys/lock.h Thu May 9 16:09:39 2013 (r250410)
+++ head/sys/sys/lock.h Thu May 9 16:28:18 2013 (r250411)
@@ -79,6 +79,7 @@ struct lock_class {
#define LO_SLEEPABLE 0x00100000 /* Lock may be held while sleeping. */
#define LO_UPGRADABLE 0x00200000 /* Lock may be upgraded/downgraded. */
#define LO_DUPOK 0x00400000 /* Don't check for duplicate acquires */
+#define LO_IS_VNODE 0x00800000 /* Tell WITNESS about a VNODE
lock */
#define LO_CLASSMASK 0x0f000000 /* Class index bitmask. */
#define LO_NOPROFILE 0x10000000 /* Don't profile this lock */
Modified: head/sys/sys/lockmgr.h
==============================================================================
--- head/sys/sys/lockmgr.h Thu May 9 16:09:39 2013 (r250410)
+++ head/sys/sys/lockmgr.h Thu May 9 16:28:18 2013 (r250411)
@@ -146,6 +146,7 @@ _lockmgr_args_rw(struct lock *lk, u_int
#define LK_NOWITNESS 0x000010
#define LK_QUIET 0x000020
#define LK_ADAPTIVE 0x000040
+#define LK_IS_VNODE 0x000080 /* Tell WITNESS about a VNODE
lock */
/*
* Additional attributes to be used in lockmgr().
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"