On Mon, 20 Jun 2005, Eirik Øverby wrote:

Hmm. Looks like a bug in dummynet. ipfw should not be directly re- injecting UDP traffic back into the input path from an outbound path, or it risks re-entering, generating lock order problems, etc. It should be getting dropped into the netisr queue to be processed from the netisr context.

This problem would exist across all 5.4 installations, both i386 and amd64? Would it depend on heavy load, or could it theoretically happen at any time when there's traffic? All three of my fbsd5 servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencing random hangs with ~a few weeks between, impression is that if running single-cpu mode they are all stable. All using dummynet in a comparable manner. Ideas?

Yes. Basically, the network stack avoids recursion in processing for "complicated" packets by deferring processing an offending packet to a thread called the 'netisr'. Whenever the stack reaches a possible recursion point on a packet, it's supposed to queue the packet for processing 'later' in a per-protocol queue, unwind, and then when the netisr runs, pick up and continue processing. In the stack trace you provide, dummynet appears to immediately immediately invoke the in-bound network path from the out-bound network path, walking back into the network stack from the outbound path. This is generally forbidden, for a variety of reasons:

- We do allow the in-bound path to call the out-bound path, so that
  protocols like TCP, and services like NFS can turn around packets
  without a context switch.  If further recursion is permitted, the stack
  may overflow.

- Both paths may hold network stack locks over calls in either direction
  -- specifically, we allow protocol locks to be held over calls into the
  socket layer, as the protocol layer drives operation; if a recursive
  call is made, deadlocks can occur due to violating the lock order.  This
  is what is happening in your case.

Pretty much all network code is entirely architecture-independent, so bugs typically span architectures, although race conditions can sometimes be hard to reproduce if they require precise timing and multiple processors.

Is it possible to configure dummynet out of your configuration, and see if the problem goes away?

I'm running a test right now, will let you know in the morning.

Thanks.

Robert N M Watson
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to