On Wed, Oct 04, 2006 at 03:53:54PM -0400, Vivek Khera wrote:
> On Oct 4, 2006, at 3:41 PM, Kris Kennaway wrote:
> >>from what i read in the output from kgdb, it seems that something
> >>locked the kernel and we broke to debugger from the watchdog timeout
> >>(I enable software watchdog).
> >
> >Hmm, be careful with that - if you set the timeout too low (and note
> >that for some workloads O(minutes) may even be too low) then you'll
> >get a lot of false positives.
> hmmm... the man page for watchdogd doesn't specify what the default  
> timeout is, but that's what we've got running.   [tappity-tapptity- 
> tap...] source seems to indicate 16seconds timeout.  interesting.
Yes, that's probably way too low.  e.g. when creating a snapshot (as
in your workload) your machine may be unresponsive for up to a few
minutes depending on your filesystem size and I/O load.

> so we could be getting hit with a bge interrupt storm and timing  
> out.  i'll turn off fido and see what happens.
> at this point, though, i think i have two separate issues.  one with  
> bge and watchdog timeout, and one with locking of the filesystem with  
> mksnap_ffs, as the symptoms are different.

That sounds plausible.  Many people are reporting issues involving NIC
interrupts, but they're proving elusive to characterize so far (there
may be multiple problems).


Attachment: pgpjeHGeti1Tr.pgp
Description: PGP signature

Reply via email to