Peter Buckingham wrote:
Hi Eric,
eric kustarz wrote:
The first thing i would do is see if any I/O is happening ('zpool
iostat 1'). If there's none, then perhaps the machine is hung (which
you then would want to grab a couple of '::threadlist -v 10's from mdb
to figure out if there are hung threads).
there seems to be no IO after the initial IO according to zpool iostat.
When we run zpool status it hangs:
HON hcb116 ~ $ zpool status
pool: tank state: ONLINE
scrub: none requested
<hang>
I'll send you the mdb output privately since it's quite big.
60 seconds should be plenty of time for the async write(s) to
complete. We try to push out txg (transaction groups) every 5
seconds. However, if the system is overloaded, then the txgs could
take longer.
That's what I would have thought.
They 'sync' hanging is intriguing. Perhaps the system is just
overloaded and sync command is making it worse. Seeing what 'fsync'
would do would be interesting.
I've not tried this yet.
What else is the machine doing?
we are running the honeycomb environment (you can see when I send you
the mdb output).
is there some issue for the zpool mirrors if one of the slices
disappears or is unresponsive after the pool has been brought online?
This can be a problem if an IO issued to the device never completes
(i.e., hangs). This can hang up the pool. A well-behaved device/driver
should eventually time out the IO, but we have seen instances where
this never seems to happen.
-Mark
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss