Re: [zfs-discuss] file not persistent after node bounce when there is a bad disk?

Mark Maybee Wed, 31 Jan 2007 16:22:25 -0800

Peter Buckingham wrote:

Hi Eric,
eric kustarz wrote:
The first thing i would do is see if any I/O is happening ('zpooliostat 1'). If there's none, then perhaps the machine is hung (whichyou then would want to grab a couple of '::threadlist -v 10's from mdbto figure out if there are hung threads).
there seems to be no IO after the initial IO according to zpool iostat.When we run zpool status it hangs:
HON hcb116 ~ $ zpool status
   pool: tank  state: ONLINE
   scrub: none requested
   <hang>

I'll send you the mdb output privately since it's quite big.
60 seconds should be plenty of time for the async write(s) tocomplete. We try to push out txg (transaction groups) every 5seconds. However, if the system is overloaded, then the txgs couldtake longer.
That's what I would have thought.
They 'sync' hanging is intriguing. Perhaps the system is justoverloaded and sync command is making it worse. Seeing what 'fsync'would do would be interesting.
I've not tried this yet.
What else is the machine doing?
we are running the honeycomb environment (you can see when I send youthe mdb output).
is there some issue for the zpool mirrors if one of the slices
disappears or is unresponsive after the pool has been brought online?

This can be a problem if an IO issued to the device never completes
(i.e., hangs).  This can hang up the pool.  A well-behaved device/driver
should eventually time out the IO, but we have seen instances where
this never seems to happen.

-Mark
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file not persistent after node bounce when there is a bad disk?

Reply via email to