Hello,

    As Alfred suggests below, there may be a potential problem with the mbuf
wait code calling m_reclaim() [the drain routines] with the mmbfree (mbuf
freelist mutex) lock held. Presently, to avoid a race condition, we call
m_reclaim() with the lock held and recurse into it from the drain routines
which call m_free{m}() to free mbufs. This was initially written this way; it
was partially adopted from BSD/OS (it's done the same way). This way, the
mbuf wait code is sure to be the first one to get the chance to allocate
after the drain routines run. Unfortunately, as mentionned, there appears to
be a potential lock order problem. Using as an example the BSD/OS code,
something like this happens:

- In ip_slowtimo(), IPQ_LOCK is obtained and with the lock held, ip_freef()
is called. ip_freef() calls m_freem() which grabs the MBUF_LOCK. The lock
order here is IPQ -> MBUF.

- In ip_drain(), which is called from m_reclaim() with the MBUF lock already
held, IPQ lock is acquired. Then ip_freef() is called which does m_freem()
[which grabs MBUF lock]. The lock order here, since ip_drain() is called with
MBUF already held is MBUF -> IPQ.

Are we correct to assume that there is a potential for deadlock here?
Clearly, if one thread in ip_slowtimo() has IPQ and another thread presently
draining has MBUF, there is deadlock.

The proposed patch (for FreeBSD) is at the address immediately below. For
now, what it does is avoid the lock recursion by calling m_reclaim() without
the mbuf lock held. We deal with the race condition. In the future, if we
can, we may want to weed out the drain routines and make sure that there are
no lock ordering problems; but for now, until we're done with all the
locking, I think it's better to do it this way.

Thanks,
Bosko.

P.S.: Please feel free to snip -arch from the CC list, if judged appropriate.

Alfred Perlstein wrote:

> * Bosko Milekic <[EMAIL PROTECTED]> [010108 19:13] wrote:
> > Hi Alfred,
> >
> >     Can you please review this:
> >
> >     http://people.freebsd.org/~bmilekic/code/no_rec_mbsleep.patch
> >
> >     All it does is in m_mballoc_wait() [wait routine for mbuf allocation]
> > drop the mbuf freelist lock before calling the drain routines. It avoids
> > recursing into the lock from within the drain routines. We discussed this
> > some time ago and you brought up the fact that it should probably be
changed;
> > although I don't recall exactly why (besides for generally just getting
rid
> > of the lock recursion and relatively large lock contention)... I recall
you
> > having an even more important reason, can you remind me, if applicable?
>
> It's a lock ordering problem, you _can_ unwind the lock in the protocol
> drain routines before locking down a socketbuffer or whatnot, but the
> fact of the matter is if you have to lock _anything_ you pretty much
> have to unwind the lock or have a reversal problem as the mbuf system
> should be a leaf-lock.
>
> You're also blocking the natural free'ing of the mbufs by other code
> running.
>
> So the comment is sorta lacking, it's to avoid:
> 1) lock recursion (as you stated)
> 2) lock reversal
> 3) needing to to explicitly unwind the lock to avoid #2
>
> At least I think. :)
>
> It looks like BSD/os holds the lock during the protocol drain routines,
> but I think that's a mistake.
>
> Would you mind reposting this to -net/-smp, I think I'm right about
> this, but some peer review couldn't hurt, I wasn't sure if reposting
> was ok considering this came across in private.
>
> --
> -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
> "I have the heart of a child; I keep it in a jar on my desk."




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message

Reply via email to