On 6/12/11 6:18 PM, "Jim Klimov" <jimkli...@cos.ru> wrote:
>2011-06-12 23:57, Richard Elling wrote: >> >> How long should it wait? Before you answer, read through the thread: >> http://lists.illumos.org/pipermail/developer/2011-April/001996.html >> Then add your comments :-) >> -- richard > >But the point of my previous comment was that, according >to the original poster, after a while his disk did get >marked as "faulted" or "offlined". IF this happened >during the system's initial uptime, but it froze anyway, >it it a problem. > >What I do not know is if he rebooted the box within the >5 minutes set aside for the timeout, or if some other >processes gave up during the 5 minutes of no IO and >effectively hung the system. > >If it is somehow the latter - that the inaccessible drive >did (lead to) hang(ing) the system past any set IO retry >timeouts - that is a bug, I think. > Here's the timeline: - The Intel X25-M was marked "FAULTED" Monday evening, 6pm. This was not detected by NexentaStor. - The storage system performance diminished at 9am the next morning. Intermittent spikes in system load (of the VMs hosted on the unit). - By 11am, the Nexenta interface and console were unresponsive and the virtual machines dependent on the underlying storage stalled completely. - At 12pm, I gained physical access to the server, but I could not acquire console access (shell or otherwise). I did see the FMA error output on the screen indicating the actual device FAULT time. - I powered the system off, removed the Intel X-25M, and powered back on. The VMs picked up where they left off and the system stabilized. The total impact to end-users was 3 hours of either poor performance or straight downtime. -- Edmund White ewwh...@mac.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss