Well, to be fair, there were some special cases.

I know we had 3 separate occasions with broken HDDs, when we were using UFS. 2 of these appeared to hang, and the 3rd only hung once we replaced the disk. This is most likely due to use using UFS in zvol (for quotas). We got an IDR patch, and eventually this was released as "UFS 3-way deadlock writing log with zvol". I forget the number right now, but the patch is out.

This is the very first time we have lost a disk in a purely-ZFS system, and I was somewhat hoping that this would be the time everything went smoothly. But it did not.

However, I have also experienced (once) a disk dying in such a way that it took out the chain in a netapp, so perhaps the disk died like this here to (it is really dead).

But still disappointing.

Power cycling the x4540 takes about 7 minutes (service to service), but with Sol svn116(?) and up it can do quiesce-reboots, which take about 57 seconds. In this case, we had to power cycle.



Ross wrote:
Whoah!

"We have yet to experience losing a
disk that didn't force a reboot"

Do you have any notes on how many times this has happened Jorgen, or what steps 
you've taken each time?

I appreciate you're probably more concerned with getting an answer to your 
question, but if ZFS needs a reboot to cope with failures on even an x4540, 
that's an absolute deal breaker for everything we want to do with ZFS.

Ross

--
Jorgen Lundman       | <lund...@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to