Re: [PATCH v4 1/3] cxlflash: Base error recovery support

Matthew R. Ochs Tue, 11 Aug 2015 14:45:34 -0700

> On Aug 10, 2015, at 10:41 PM, Benjamin Herrenschmidt 
> <b...@kernel.crashing.org> wrote:
> 
> On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote:
>> Introduce support for enhanced I/O error handling.
>> 
>> Signed-off-by: Matthew R. Ochs <mro...@linux.vnet.ibm.com>
>> Signed-off-by: Manoj N. Kumar <ma...@linux.vnet.ibm.com>
>> ---
> 
> So I'm not necessarily very qualified to review SCSI bits as I haven't
> done anything close to the Linux SCSI code for almost a decade but I
> have a couple of questions and nits…


Thanks for reviewing Ben!

> 
>>      wait_queue_head_t tmf_waitq;
>>      bool tmf_active;
>> -    u8 err_recovery_active:1;
>> +    wait_queue_head_t limbo_waitq;
>> +    enum cxlflash_state state;
>> };
>> 
>> struct afu_cmd {
>> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
>> index 76a7286..18359d4 100644
>> --- a/drivers/scsi/cxlflash/main.c
>> +++ b/drivers/scsi/cxlflash/main.c
>> @@ -380,6 +380,18 @@ static int cxlflash_queuecommand(struct Scsi_Host 
>> *host, struct scsi_cmnd *scp)
>>      }
>>      spin_unlock_irqrestore(&cfg->tmf_waitq.lock, lock_flags);
>> 
>> +    switch (cfg->state) {
>> +    case STATE_LIMBO:
>> +            pr_debug_ratelimited("%s: device in limbo!\n", __func__);
>> +            rc = SCSI_MLQUEUE_HOST_BUSY;
>> +            goto out;
>> +    case STATE_FAILTERM:
>> +            pr_debug_ratelimited("%s: device has failed!\n", __func__);
>> +            goto error;
>> +    default:
>> +            break;
>> +    }
> 
> I noticed that you mostly read and write that new state outside of your
> tmf_waitq.lock. Is there any other lock or mutex protecting you ? In
> this specific case ?

No there isn’t.

> 
> I know in the old day queuecommand was called with a host lock, is that
> still the case ?

No, it’s no longer the case.

> Or you just don't care about an occasional spurrious
> SCSI_MLQUEUE_HOST_BUSY ?

This is pretty much true. =)

> 
>>      cmd = cxlflash_cmd_checkout(afu);
>>      if (unlikely(!cmd)) {
>>              pr_err("%s: could not get a free command\n", __func__);
>> @@ -428,6 +440,10 @@ static int cxlflash_queuecommand(struct Scsi_Host 
>> *host, struct scsi_cmnd *scp)
>> 
>> out:
>>      return rc;
>> +error:
>> +    scp->result = (DID_NO_CONNECT << 16);
>> +    scp->scsi_done(scp);
>> +    return 0;
>> }
>> 
>> /**
>> @@ -455,9 +471,21 @@ static int cxlflash_eh_device_reset_handler(struct 
>> scsi_cmnd *scp)
>>               get_unaligned_be32(&((u32 *)scp->cmnd)[2]),
>>               get_unaligned_be32(&((u32 *)scp->cmnd)[3]));
>> 
>> -    rcr = send_tmf(afu, scp, TMF_LUN_RESET);
>> -    if (unlikely(rcr))
>> +    switch (cfg->state) {
>> +    case STATE_NORMAL:
>> +            rcr = send_tmf(afu, scp, TMF_LUN_RESET);
>> +            if (unlikely(rcr))
>> +                    rc = FAILED;
>> +            break;
>> +    case STATE_LIMBO:
>> +            wait_event(cfg->limbo_waitq, cfg->state != STATE_LIMBO);
>> +            if (cfg->state == STATE_NORMAL)
>> +                    break;
>> +            /* fall through */
>> +    default:
>>              rc = FAILED;
>> +            break;
>> +    }
> 
> Same here, since you are doing wait_event, I assume no lock is held
> (unless it's a mutex) and in subsequent places I am not listing.
> 
> As I said, it could well be that it all works fine but it would be worth
> mentioning in this case because it's not obvious from simply reading the
> code.

It has been working well, and yes, your assumption is correct. We can look at
adding more/better serialization as a future enhancement.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/3] cxlflash: Base error recovery support

Reply via email to