On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote: > Posted in greater detail at Server Fault > - [1]http://serverfault.com/q/277966/13325 > > I have an HP ProLiant DL380 G7 system running NexentaStor. The server has > 36GB RAM, 2 LSI 9211-8i SAS controllers (no SAS expanders), 2 SAS system > drives, 12 SAS data drives, a hot-spare disk, an Intel X25-M L2ARC cache > and a DDRdrive PCI ZIL accelerator. This system serves NFS to multiple > VMWare hosts. I also have about 90-100GB of deduplicated data on the > array. > > I've had two incidents where performance tanked suddenly, leaving the VM > guests and Nexenta SSH/Web consoles inaccessible and requiring a full > reboot of the array to restore functionality. In both cases, it was the > Intel X-25M L2ARC SSD that failed or was "offlined". NexentaStor failed to > alert me on the cache failure, however the general ZFS FMA alert was > visible on the (unresponsive) console screen. > > The "zpool status" output showed: > > cache > c6t5001517959467B45d0 FAULTED 2 542 0 too many errors > > This did not trigger any alerts from within Nexenta. > > I was under the impression that an L2ARC failure would not impact the > system. But in this case, it was the culprit. I've never seen any > recommendations to RAID L2ARC for resiliency. Removing the bad SSD > entirely from the server got me back running, but I'm concerned about the > impact of the device failure and the lack of notification from > NexentaStor. > > What's the current best-choice SSD for L2ARC cache applications these > days? It seems as though the Intel units are no longer well-regarded. >
IIRC recently there was discussion on this list about firmware bug on the Intel X25 SSDs causing them to fail under high disk IO with "reset storms". Maybe you're hitting that firmware bug. -- Pasi _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss