On Sun, 11 Oct 2009, Harsha Srinath wrote:
I'm running an updated HEAD kernel and got a page fault in
ifindex_alloc_locked() in if.c. I figured that the problem was caused by the
(pluggable) network card of my laptop and found that during the
initialization of the interface, cb_event_thread() takes the giant lock and
up the call chain in if_alloc(), we call IFNET_WLOCK() and assert on the RW
locks in ifindex_alloc_locked(). It is in the asset macro
IFNET_WLOCK_ASSERT() I see the page fault.
I looked up some recent related changes and noticed the following comment in
one of the check-ins in-
http://svn.freebsd.org/viewvc/base/head/sys/net/if.c
"Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK,
and add missing IFNET_WLOCK in if_alloc(). This does not close all known
races in this code."
So I think I have hit one of those fault conditions.
Apparently the giant lock code was removed and added back recently -
http://svn.freebsd.org/viewvc/base/head/sys/dev/pccbb/pccbb.c
I believe that the root cause is that ifnet_rw is a non sleepable exclusive
RW lock and we have taken the exclusive sleep mutex Giant before that.
Any pointers and suggestions are welcome.
Hi Harsha--
Giant is a bit special in that the long-term sleep code in the kernel knows to
drop it when sleeping, and re-acquire when waking up. So, unlike all other
mutexes, it should be OK to hold it in this case, as Giant will simply get
dropped if the kernel has to sleep waiting on a sleepable lock. This is
because, historically in FreeBSD 3.x/4.x, the kernel was protected by a single
spinlock, which would get released whenever the kernel stopped executing, such
as during an I/O sleep. On the whole, Giant has disappeared from the modern
kernel, but where it is used, it retains those curious historic properties.
To break things down a bit further, IFNET_WLOCK is, itself, a bit special:
notice that in FreeBSD 8, it's actually two locks, a sleep lock, and a mutex,
which must both be acquired exclusively to ensure mutual exclusion.
if_alloc() and associated calls are also sleepable because they perform
potentially sleeping memory allocation (M_WAITOK), so it's an invariant of any
code calling interface allocation that it must be able to tolerate a sleep.
Do you have a copy of the stack trace and fault information handy? In my
experience, a NULL pointer deref or other page fault in the locking code for a
global lock is almost always corrupted thread state, perhaps due to tripping
over another thread having locked a corrupted/freed/uninitialized lock. We
might be able to track that down by tracing other threads that were in
execution at the time of the panic.
Robert
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"