[LSF/MM TOPIC] SCSI Error Handling and HBA Recovery

Bart Van Assche Wed, 23 Jan 2019 16:46:40 -0800

Several SCSI low-level drivers need to suspend .queuecommand() calls while
HBA or transport layer recovery happens. The iSCSI and SRP initiator drivers
use scsi_target_block() to block new .queuecommand() calls while recovery
happens. scsi_target_block() prevents that the block layer core triggers new
.queuecommand() calls but does not prevent that the SCSI error handler calls
.queuecommand(). SCSI LLD authors have the choice of either hoping that
.queuecommand() calls from the SCSI error handler won't happen while transport
layer recovery is in progress or to add code in the .queuecommand() function
that detects from which context that call comes and to delay such
.queuecommand() calls. In the SRP initiator driver that code looks as follows:


        const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;

        /*
         * The SCSI EH thread is the only context from which srp_queuecommand()
         * can get invoked for blocked devices (SDEV_BLOCK /
         * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
         * locking the rport mutex if invoked from inside the SCSI EH.
         */
        if (in_scsi_eh)
                mutex_lock(&rport->mutex);

In my opinion the SCSI core should make it easy for LLD authors to prevent that
the error handler calls .queuecommand() while transport layer recovery is in
progress. So considerable time ago I posted several patches that modify the SCSI
error handler and that avoid that SCSI LLDs have to detect the context a
.queuecommand() call comes from. None of these patches were accepted and no 
alternative approach was proposed. Hence the proposal to discuss this topic in
person during LSF/MM.

See also "[PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside
scsi_queue_rq()" (https://www.spinics.net/lists/linux-rdma/msg73842.html).

Thanks,

Bart.

[LSF/MM TOPIC] SCSI Error Handling and HBA Recovery

Reply via email to