On Tue, May 28, 2013 at 5:38 PM, Jeremy Linton <jlin...@tributary.com> wrote:
>         This is another part of what formed my opinions about error 
> isolation. If one
> of your devices goes out to lunch and isn't recovering via abort/lun reset.
> Its done! Wrecking the rest of the SAN doing "bus resets" and HBA resets is a
> good way to take a serious problem and turn it into a full blown catastrophe.

This is the gist of the issue, once you got to an abort you are screwed already.
You need the abort but anything else should be reserved to when things
are really
dead (the HBA might still recover on a host reset, but only do it if the host is
really unresponsive).

That's why I prefer to have a long timeout for the command and a long
timeout for
the abort. The application above should handle itself with its own
timeout once the
abort was sent (the buffer remains locked until the abort returns).
The device itself
is likely stuck in error recovery and it will come out of it when its
own internal
timeouts are exhausted which can be infinite and will generally be very large.

Baruch
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to