On Jun 11, 2011, at 9:26 AM, Jim Klimov wrote: > 2011-06-11 19:15, Pasi Kärkkäinen пишет: >> On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote: >>> I've had two incidents where performance tanked suddenly, leaving the VM >>> guests and Nexenta SSH/Web consoles inaccessible and requiring a full >>> reboot of the array to restore functionality. In both cases, it was the >>> Intel X-25M L2ARC SSD that failed or was "offlined". NexentaStor failed >>> to >>> alert me on the cache failure, however the general ZFS FMA alert was >>> visible on the (unresponsive) console screen. >>> >>> The "zpool status" output showed: >>> >>> cache >>> c6t5001517959467B45d0 FAULTED 2 542 0 too many errors >>> >>> This did not trigger any alerts from within Nexenta. >>> >>> I was under the impression that an L2ARC failure would not impact the >>> system. But in this case, it was the culprit. I've never seen any >>> recommendations to RAID L2ARC for resiliency. Removing the bad SSD >>> entirely from the server got me back running, but I'm concerned about the >>> impact of the device failure and the lack of notification from >>> NexentaStor. >> IIRC recently there was discussion on this list about firmware bug >> on the Intel X25 SSDs causing them to fail under high disk IO with "reset >> storms". > Even if so, this does not forgive ZFS hanging - especially > if it detected the drive failure, and especially if this drive > is not required for redundant operation.
How long should it wait? Before you answer, read through the thread: http://lists.illumos.org/pipermail/developer/2011-April/001996.html Then add your comments :-) -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss