On Thu, 2017-04-20 at 21:59 +0000, Bart Van Assche wrote:
> On Tue, 2017-04-18 at 16:56 -0700, James Bottomley wrote:
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index e5a2d590a104..31171204cfd1 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -2611,7 +2611,6 @@ scsi_device_set_state(struct scsi_device
> > *sdev, enum scsi_device_state state)
> >             case SDEV_QUIESCE:
> >             case SDEV_OFFLINE:
> >             case SDEV_TRANSPORT_OFFLINE:
> > -           case SDEV_BLOCK:
> >                     break;
> >             default:
> >                     goto illegal;
> > @@ -2625,6 +2624,7 @@ scsi_device_set_state(struct scsi_device
> > *sdev, enum scsi_device_state state)
> >             case SDEV_OFFLINE:
> >             case SDEV_TRANSPORT_OFFLINE:
> >             case SDEV_CANCEL:
> > +           case SDEV_BLOCK:
> >             case SDEV_CREATED_BLOCK:
> >                     break;
> >             default:
> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index 82dfe07b1d47..e477f95bf169 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -1282,8 +1282,17 @@ void __scsi_remove_device(struct scsi_device
> > *sdev)
> >             return;
> >  
> >     if (sdev->is_visible) {
> > -           if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> > -                   return;
> > +           /*
> > +            * If blocked, we go straight to DEL so any
> > commands
> > +            * issued during the driver shutdown (like sync
> > cache)
> > +            * are errored
> > +            */
> > +           if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> > {
> > +                   if (scsi_device_set_state(sdev, SDEV_DEL)
> > != 0)
> > +                           return;
> > +                   else
> > +                           scsi_start_queue(sdev);
> > +           }
> >  
> >             bsg_unregister_queue(sdev->request_queue);
> >             device_unregister(&sdev->sdev_dev);
> 
> Hello James,
> 
> This approach cannot work. A scsi_target_block() call by the 
> transport layer can happen concurrently with the 
> __scsi_remove_device() call and hence can occur at any time between 
> the scsi_start_queue() call by __scsi_remove_device() and the 
> sd_shutdown() call, resulting in a deadlock.

How is that possible?  Once the device goes into the CANCEL state, it
no longer can be found by starget_for_each_device() because
scsi_device_get() returns NULL ... unless you also have a patch
altering that?

James


> I have been able to trigger this with my tests by simulating a cable 
> pull shortly before running "rmmod ib_srp".
> 
> That deadlock did not occur with the patch series that makes 
> synchronize cache upon shutdown asynchronous. I'm going to resubmit 
> that patch series.
> 
> Bart.

Reply via email to