Re: [zfs-discuss] Impact of L2ARC device failure and SSD recommendations

Edmund White Sat, 11 Jun 2011 08:30:15 -0700

So, can this be fixed in firmware? How can I determine if the drive is
actually bad?


-- 
Edmund White
ewwh...@mac.com



On 6/11/11 10:15 AM, "Pasi Kärkkäinen" <pa...@iki.fi> wrote:

>On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote:
>>    Posted in greater detail at Server Fault
>>    - [1]http://serverfault.com/q/277966/13325
>> 
>>    I have an HP ProLiant DL380 G7 system running NexentaStor. The
>>server has
>>    36GB RAM, 2 LSI 9211-8i SAS controllers (no SAS expanders), 2 SAS
>>system
>>    drives, 12 SAS data drives, a hot-spare disk, an Intel X25-M L2ARC
>>cache
>>    and a DDRdrive PCI ZIL accelerator. This system serves NFS to
>>multiple
>>    VMWare hosts. I also have about 90-100GB of deduplicated data on the
>>    array.
>> 
>>    I've had two incidents where performance tanked suddenly, leaving
>>the VM
>>    guests and Nexenta SSH/Web consoles inaccessible and requiring a full
>>    reboot of the array to restore functionality. In both cases, it was
>>the
>>    Intel X-25M L2ARC SSD that failed or was "offlined". NexentaStor
>>failed to
>>    alert me on the cache failure, however the general ZFS FMA alert was
>>    visible on the (unresponsive) console screen.
>> 
>>    The "zpool status" output showed:
>> 
>>  cache
>>  c6t5001517959467B45d0     FAULTED      2   542     0  too many errors
>> 
>>    This did not trigger any alerts from within Nexenta.
>> 
>>    I was under the impression that an L2ARC failure would not impact the
>>    system. But in this case, it was the culprit. I've never seen any
>>    recommendations to RAID L2ARC for resiliency. Removing the bad SSD
>>    entirely from the server got me back running, but I'm concerned
>>about the
>>    impact of the device failure and the lack of notification from
>>    NexentaStor.
>> 
>>    What's the current best-choice SSD for L2ARC cache applications these
>>    days? It seems as though the Intel units are no longer well-regarded.
>> 
>
>IIRC recently there was discussion on this list about firmware bug
>on the Intel X25 SSDs causing them to fail under high disk IO with "reset
>storms".
>
>Maybe you're hitting that firmware bug.
>
>-- Pasi
>


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Impact of L2ARC device failure and SSD recommendations

Reply via email to