On Mon, 20 Jun 2005, Eirik Øverby wrote:
Hmm. Looks like a bug in dummynet. ipfw should not be directly re-
injecting UDP traffic back into the input path from an outbound path,
or it risks re-entering, generating lock order problems, etc. It should
be getting dropped into the netisr queue to be processed from the
netisr context.
This problem would exist across all 5.4 installations, both i386 and
amd64? Would it depend on heavy load, or could it theoretically happen
at any time when there's traffic? All three of my fbsd5 servers (dual
opteron, dual p3-1ghz, dual p3-700mhz) are experiencing random hangs
with ~a few weeks between, impression is that if running single-cpu mode
they are all stable. All using dummynet in a comparable manner. Ideas?
Yes. Basically, the network stack avoids recursion in processing for
"complicated" packets by deferring processing an offending packet to a
thread called the 'netisr'. Whenever the stack reaches a possible
recursion point on a packet, it's supposed to queue the packet for
processing 'later' in a per-protocol queue, unwind, and then when the
netisr runs, pick up and continue processing. In the stack trace you
provide, dummynet appears to immediately immediately invoke the in-bound
network path from the out-bound network path, walking back into the
network stack from the outbound path. This is generally forbidden, for a
variety of reasons:
- We do allow the in-bound path to call the out-bound path, so that
protocols like TCP, and services like NFS can turn around packets
without a context switch. If further recursion is permitted, the stack
may overflow.
- Both paths may hold network stack locks over calls in either direction
-- specifically, we allow protocol locks to be held over calls into the
socket layer, as the protocol layer drives operation; if a recursive
call is made, deadlocks can occur due to violating the lock order. This
is what is happening in your case.
Pretty much all network code is entirely architecture-independent, so bugs
typically span architectures, although race conditions can sometimes be
hard to reproduce if they require precise timing and multiple processors.
Is it possible to configure dummynet out of your configuration, and see
if the problem goes away?
I'm running a test right now, will let you know in the morning.
Thanks.
Robert N M Watson
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"