On Sat, Jun 11, 2011 at 08:26:34PM +0400, Jim Klimov wrote: > 2011-06-11 19:15, Pasi Kärkkäinen ??????????: >> On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote: >>> I've had two incidents where performance tanked suddenly, leaving the VM >>> guests and Nexenta SSH/Web consoles inaccessible and requiring a full >>> reboot of the array to restore functionality. In both cases, it was the >>> Intel X-25M L2ARC SSD that failed or was "offlined". NexentaStor failed >>> to >>> alert me on the cache failure, however the general ZFS FMA alert was >>> visible on the (unresponsive) console screen. >>> >>> The "zpool status" output showed: >>> >>> cache >>> c6t5001517959467B45d0 FAULTED 2 542 0 too many errors >>> >>> This did not trigger any alerts from within Nexenta. >>> >>> I was under the impression that an L2ARC failure would not impact the >>> system. But in this case, it was the culprit. I've never seen any >>> recommendations to RAID L2ARC for resiliency. Removing the bad SSD >>> entirely from the server got me back running, but I'm concerned about >>> the >>> impact of the device failure and the lack of notification from >>> NexentaStor. >> IIRC recently there was discussion on this list about firmware bug >> on the Intel X25 SSDs causing them to fail under high disk IO with "reset >> storms". > Even if so, this does not forgive ZFS hanging - especially > if it detected the drive failure, and especially if this drive > is not required for redundant operation. > > I've seen similar bad behaviour on my oi_148a box when > I tested USB flash devices as L2ARC caches and > occasionally they died by slightly moving out of the > USB socket due to vibration or whatever reason ;) > > Similarly, this oi_148a box hung upon loss of SATA > connection to a drive in the raidz2 disk set due to > unreliable cable connectors, while it should have > stalled IOs to that pool but otherwise the system > should have remained remain responsive (tested > failmode=continue and failmode=wait on different > occasions). > > So I can relate - these things happen, they do annoy, > and I hope they will be fixed sometime soon so that > ZFS matches its docs and promises ;) >
True, definitely sounds like a bug in ZFS aswell.. -- Pasi _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss