On Jun 11, 2011, at 9:26 AM, Jim Klimov wrote:

> 2011-06-11 19:15, Pasi Kärkkäinen пишет:
>> On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote:
>>>    I've had two incidents where performance tanked suddenly, leaving the VM
>>>    guests and Nexenta SSH/Web consoles inaccessible and requiring a full
>>>    reboot of the array to restore functionality. In both cases, it was the
>>>    Intel X-25M L2ARC SSD that failed or was "offlined". NexentaStor failed 
>>> to
>>>    alert me on the cache failure, however the general ZFS FMA alert was
>>>    visible on the (unresponsive) console screen.
>>> 
>>>    The "zpool status" output showed:
>>> 
>>>  cache
>>>  c6t5001517959467B45d0     FAULTED      2   542     0  too many errors
>>> 
>>>    This did not trigger any alerts from within Nexenta.
>>> 
>>>    I was under the impression that an L2ARC failure would not impact the
>>>    system. But in this case, it was the culprit. I've never seen any
>>>    recommendations to RAID L2ARC for resiliency. Removing the bad SSD
>>>    entirely from the server got me back running, but I'm concerned about the
>>>    impact of the device failure and the lack of notification from
>>>    NexentaStor.
>> IIRC recently there was discussion on this list about firmware bug
>> on the Intel X25 SSDs causing them to fail under high disk IO with "reset 
>> storms".
> Even if so, this does not forgive ZFS hanging - especially
> if it detected the drive failure, and especially if this drive
> is not required for redundant operation.

How long should it wait? Before you answer, read through the thread:
        http://lists.illumos.org/pipermail/developer/2011-April/001996.html
Then add your comments :-)
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to