Well, to be fair, there were some special cases.
I know we had 3 separate occasions with broken HDDs, when we were using
UFS. 2 of these appeared to hang, and the 3rd only hung once we replaced
the disk. This is most likely due to use using UFS in zvol (for quotas).
We got an IDR patch, and eventually this was released as "UFS 3-way
deadlock writing log with zvol". I forget the number right now, but the
patch is out.
This is the very first time we have lost a disk in a purely-ZFS system,
and I was somewhat hoping that this would be the time everything went
smoothly. But it did not.
However, I have also experienced (once) a disk dying in such a way that
it took out the chain in a netapp, so perhaps the disk died like this
here to (it is really dead).
But still disappointing.
Power cycling the x4540 takes about 7 minutes (service to service), but
with Sol svn116(?) and up it can do quiesce-reboots, which take about 57
seconds. In this case, we had to power cycle.
Ross wrote:
Whoah!
"We have yet to experience losing a
disk that didn't force a reboot"
Do you have any notes on how many times this has happened Jorgen, or what steps
you've taken each time?
I appreciate you're probably more concerned with getting an answer to your
question, but if ZFS needs a reboot to cope with failures on even an x4540,
that's an absolute deal breaker for everything we want to do with ZFS.
Ross
--
Jorgen Lundman | <lund...@lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss