2011-06-11 19:15, Pasi Kärkkäinen пишет:
On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote:
    I've had two incidents where performance tanked suddenly, leaving the VM
    guests and Nexenta SSH/Web consoles inaccessible and requiring a full
    reboot of the array to restore functionality. In both cases, it was the
    Intel X-25M L2ARC SSD that failed or was "offlined". NexentaStor failed to
    alert me on the cache failure, however the general ZFS FMA alert was
    visible on the (unresponsive) console screen.

    The "zpool status" output showed:

  cache
  c6t5001517959467B45d0     FAULTED      2   542     0  too many errors

    This did not trigger any alerts from within Nexenta.

    I was under the impression that an L2ARC failure would not impact the
    system. But in this case, it was the culprit. I've never seen any
    recommendations to RAID L2ARC for resiliency. Removing the bad SSD
    entirely from the server got me back running, but I'm concerned about the
    impact of the device failure and the lack of notification from
    NexentaStor.
IIRC recently there was discussion on this list about firmware bug
on the Intel X25 SSDs causing them to fail under high disk IO with "reset 
storms".
Even if so, this does not forgive ZFS hanging - especially
if it detected the drive failure, and especially if this drive
is not required for redundant operation.

I've seen similar bad behaviour on my oi_148a box when
I tested USB flash devices as L2ARC caches and
occasionally they died by slightly moving out of the
USB socket due to vibration or whatever reason ;)

Similarly, this oi_148a box hung upon loss of SATA
connection to a drive in the raidz2 disk set due to
unreliable cable connectors, while it should have
stalled IOs to that pool but otherwise the system
should have remained remain responsive (tested
failmode=continue and failmode=wait on different
occasions).

So I can relate - these things happen, they do annoy,
and I hope they will be fixed sometime soon so that
ZFS matches its docs and promises ;)

//Jim Klimov


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to