Hello, As Alfred suggests below, there may be a potential problem with the mbuf wait code calling m_reclaim() [the drain routines] with the mmbfree (mbuf freelist mutex) lock held. Presently, to avoid a race condition, we call m_reclaim() with the lock held and recurse into it from the drain routines which call m_free{m}() to free mbufs. This was initially written this way; it was partially adopted from BSD/OS (it's done the same way). This way, the mbuf wait code is sure to be the first one to get the chance to allocate after the drain routines run. Unfortunately, as mentionned, there appears to be a potential lock order problem. Using as an example the BSD/OS code, something like this happens: - In ip_slowtimo(), IPQ_LOCK is obtained and with the lock held, ip_freef() is called. ip_freef() calls m_freem() which grabs the MBUF_LOCK. The lock order here is IPQ -> MBUF. - In ip_drain(), which is called from m_reclaim() with the MBUF lock already held, IPQ lock is acquired. Then ip_freef() is called which does m_freem() [which grabs MBUF lock]. The lock order here, since ip_drain() is called with MBUF already held is MBUF -> IPQ. Are we correct to assume that there is a potential for deadlock here? Clearly, if one thread in ip_slowtimo() has IPQ and another thread presently draining has MBUF, there is deadlock. The proposed patch (for FreeBSD) is at the address immediately below. For now, what it does is avoid the lock recursion by calling m_reclaim() without the mbuf lock held. We deal with the race condition. In the future, if we can, we may want to weed out the drain routines and make sure that there are no lock ordering problems; but for now, until we're done with all the locking, I think it's better to do it this way. Thanks, Bosko. P.S.: Please feel free to snip -arch from the CC list, if judged appropriate. Alfred Perlstein wrote: > * Bosko Milekic <[EMAIL PROTECTED]> [010108 19:13] wrote: > > Hi Alfred, > > > > Can you please review this: > > > > http://people.freebsd.org/~bmilekic/code/no_rec_mbsleep.patch > > > > All it does is in m_mballoc_wait() [wait routine for mbuf allocation] > > drop the mbuf freelist lock before calling the drain routines. It avoids > > recursing into the lock from within the drain routines. We discussed this > > some time ago and you brought up the fact that it should probably be changed; > > although I don't recall exactly why (besides for generally just getting rid > > of the lock recursion and relatively large lock contention)... I recall you > > having an even more important reason, can you remind me, if applicable? > > It's a lock ordering problem, you _can_ unwind the lock in the protocol > drain routines before locking down a socketbuffer or whatnot, but the > fact of the matter is if you have to lock _anything_ you pretty much > have to unwind the lock or have a reversal problem as the mbuf system > should be a leaf-lock. > > You're also blocking the natural free'ing of the mbufs by other code > running. > > So the comment is sorta lacking, it's to avoid: > 1) lock recursion (as you stated) > 2) lock reversal > 3) needing to to explicitly unwind the lock to avoid #2 > > At least I think. :) > > It looks like BSD/os holds the lock during the protocol drain routines, > but I think that's a mistake. > > Would you mind reposting this to -net/-smp, I think I'm right about > this, but some peer review couldn't hurt, I wasn't sure if reposting > was ok considering this came across in private. > > -- > -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] > "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message