On Wed, Oct 04, 2006 at 03:53:54PM -0400, Vivek Khera wrote: > > On Oct 4, 2006, at 3:41 PM, Kris Kennaway wrote: > > >>from what i read in the output from kgdb, it seems that something > >>locked the kernel and we broke to debugger from the watchdog timeout > >>(I enable software watchdog). > > > >Hmm, be careful with that - if you set the timeout too low (and note > >that for some workloads O(minutes) may even be too low) then you'll > >get a lot of false positives. > > hmmm... the man page for watchdogd doesn't specify what the default > timeout is, but that's what we've got running. [tappity-tapptity- > tap...] source seems to indicate 16seconds timeout. interesting.
Yes, that's probably way too low. e.g. when creating a snapshot (as in your workload) your machine may be unresponsive for up to a few minutes depending on your filesystem size and I/O load. > so we could be getting hit with a bge interrupt storm and timing > out. i'll turn off fido and see what happens. > > at this point, though, i think i have two separate issues. one with > bge and watchdog timeout, and one with locking of the filesystem with > mksnap_ffs, as the symptoms are different. That sounds plausible. Many people are reporting issues involving NIC interrupts, but they're proving elusive to characterize so far (there may be multiple problems). kris
pgpjeHGeti1Tr.pgp
Description: PGP signature