Re: [PATCH 1/7] scsi: ufs: amend the ocs handling with fatal error
On 7/30/2013 6:32 PM, Seungwon Jeon wrote: On Mon, July 29, 2013, Sujit Reddy Thumma wrote: On 7/29/2013 11:47 AM, Subhash Jadavani wrote: On 7/26/2013 7:15 PM, Seungwon Jeon wrote: Fatal error in OCS(overall command status) field indicates error conditions which is not covered by UFSHCI. It means that host cannot define the result of command status and therefore host may need to check transfer response UPIU's response and status field. It was actually found that 'CHECK CONDITION' is stored in status field of UPIU where OCS is 'FATAL ERROR'. It looks like you are assuming that there will be some kind of response from the device. But the spec. mentions - "OCS_FATAL ERROR: within host controller that is not covered by the error conditions described above in this table." Yes, error interrupt doesn't happen actually. So spec. left everything to implementers on how to trigger this error. Also, it says "within host controller" so ufshcd_is_valid_req_rsp() might fail as well. I couldn't understand why there is a need for a host controller to interpret the SCSI command status and raise an OCS_FATAL_ERROR? I feel like OCS values are related to syntax problem except OCS_FATAL_ERROR. Basically, host controller may need to refer the Response field[Target Success/Target Failure] of response UPIU. It's a overall result from response UPIU. When Response field is 'Target Failure', if host controller updates the OCS field with 'SUCCESS', it's not proper behavior. In this case host may refer the Status field of response UPIU to decide the OCS result. Of course, it's not clear thing and could depends on host's implementation, because there is no obvious description for that. But if we consider the way to report UTP error from UFSHCI 8.2.4, we can check the Response UPIU for more detailed error condition regardless OCS value. Could you check your host side? This is what our host documentation says: " SW may check the contents of Response UPIU when OCS !=0 for debug purposes. It may (or may not) help to understand the error scenario better. This is needed only for debug. When system is stable, OCS should be 0." So this clearly means that we shouldn't rely on the response UPIU if the OCS is non-zero hence simply ignore it. > "When Response field is 'Target Failure', if host controller updates the OCS field with 'SUCCESS', it's not proper behavior." I doubt whether your above understanding is true from Host controller specification point of view. Host controller would not decode the “Response” field of the “response UPIU”, it would only decode the “Transaction Type” & “Task Tag” (and may be “LUN”) from the response UPIU in order to find out the correct slot in the transfer/task request list. Even 7.3.2.3 in UFSHCI spec also mentions the same. I guess with this i don't see a need for this patch unless you feel otherwise. Regards, Subhash Thanks, Seungwon Jeon If it is clarified by the spec. then we can have generic implementation otherwise I would prefer to make this specific to those host controllers that raise OCS_FATAL_ERROR for CHECK_CONDITION. Signed-off-by: Seungwon Jeon --- drivers/scsi/ufs/ufshcd.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index b743bd6..4cf3a2d 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1199,7 +1199,7 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) switch (ocs) { case OCS_SUCCESS: - +case OCS_FATAL_ERROR: /* check if the returned transfer response is valid */ result = ufshcd_is_valid_req_rsp(lrbp->ucd_rsp_ptr); I don't see the response UPIU data of the last command response is cleared anywhere in the driver. This means its quite possible that if the current command failed (and if it is using the same tag as the last succeeded command) with the OCS_FATAL_ERROR then response UPIU (pointed by lrbp->ucd_rsp_ptr) content may be still the same as the last one. If we ensure to clear the response UPIU data after every command completion then only we can rely on the response UPIU content in case of fatal errors. Regards, Subhash if (result) { @@ -1229,7 +1229,6 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) case OCS_MISMATCH_DATA_BUF_SIZE: case OCS_MISMATCH_RESP_UPIU_SIZE: case OCS_PEER_COMM_FAILURE: -case OCS_FATAL_ERROR: default: result |= DID_ERROR << 16; dev_err(hba->dev, -- Regards, Sujit -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 0/4] scsi: ufs: Improve UFS error handling
Hi, I tested the new set of patches (V5 1-4) and it works. Thanks, Dolev > The first patch fixes many issues with current task management handling > in UFSHCD driver. Others improve error handling in various scenarios. > > These patches are rebased on: > [PATCH 9/9] drivers/scsi/ufs: don't check resource with > devm_ioremap_resource > > Changes from v4: > - Addressed comments from Seungwon Jeon on V3 > - Updated with proper locking while ufshcd state changes > - Retained LUN reset instead of device reset with DME_END_POINT_RESET > - Removed error handling decisions dependent on OCS value > - Simplified fatal error handling to abort the requests first > and then carry out reset. > Changes from v3: > - Rebased. > Changes from v2: > - [PATCH V3 1/4]: Make the task management command task tag unique > across SCSI/NOP/QUERY request tags. > - [PATCH V3 3/4]: While handling device/host reset, wait for > pending fatal handler to return if running. > Changes from v1: > - [PATCH V2 1/4]: Fix a race condition because of overloading > outstanding_tasks variable to lock the slots. A new variable > tm_slots_in_use will track which slots are in use by the driver. > - [PATCH V2 2/4]: Commit text update to clarify the hardware race > with more details. > - [PATCH V2 3/4]: Minor cleanup and rebase > - [PATCH V2 4/4]: Fix a bug - sleeping in atomic context > > Sujit Reddy Thumma (4): > scsi: ufs: Fix broken task management command implementation > scsi: ufs: Fix hardware race conditions while aborting a command > scsi: ufs: Fix device and host reset methods > scsi: ufs: Improve UFS fatal error handling > > drivers/scsi/ufs/ufshcd.c | 684 > - > drivers/scsi/ufs/ufshcd.h | 20 +- > 2 files changed, 501 insertions(+), 203 deletions(-) > > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 2/4] scsi: ufs: Fix hardware race conditions while aborting a command
Tested-by: Dolev Raviv > There is a possible race condition in the hardware when the abort > command is issued to terminate the ongoing SCSI command as described > below: > > - A bit in the door-bell register is set in the controller for a > new SCSI command. > - In some rare situations, before controller get a chance to issue > the command to the device, the software issued an abort command. > - If the device recieves abort command first then it returns success > because the command itself is not present. > - Now if the controller commits the command to device it will be > processed. > - Software thinks that command is aborted and proceed while still > the device is processing it. > - The software, controller and device may go out of sync because of > this race condition. > > To avoid this, query task presence in the device before sending abort > task command so that after the abort operation, the command is guaranteed > to be non-existent in both controller and the device. > > Signed-off-by: Sujit Reddy Thumma > --- > drivers/scsi/ufs/ufshcd.c | 70 > +++- > 1 files changed, 55 insertions(+), 15 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index d7f3746..d4ee48d 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -2485,6 +2485,12 @@ static int ufshcd_host_reset(struct scsi_cmnd *cmd) > * ufshcd_abort - abort a specific command > * @cmd: SCSI command pointer > * > + * Abort the pending command in device by sending UFS_ABORT_TASK task > management > + * command, and in host controller by clearing the door-bell register. > There can > + * be race between controller sending the command to the device while > abort is > + * issued. To avoid that, first issue UFS_QUERY_TASK to check if the > command is > + * really issued and then try to abort it. > + * > * Returns SUCCESS/FAILED > */ > static int ufshcd_abort(struct scsi_cmnd *cmd) > @@ -2493,7 +2499,8 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) > struct ufs_hba *hba; > unsigned long flags; > unsigned int tag; > - int err; > + int err = 0; > + int poll_cnt; > u8 resp = 0xF; > struct ufshcd_lrb *lrbp; > > @@ -2501,33 +2508,59 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) > hba = shost_priv(host); > tag = cmd->request->tag; > > - spin_lock_irqsave(host->host_lock, flags); > + /* If command is already aborted/completed, return SUCCESS */ > + if (!(test_bit(tag, &hba->outstanding_reqs))) > + goto out; > > - /* check if command is still pending */ > - if (!(test_bit(tag, &hba->outstanding_reqs))) { > - err = FAILED; > - spin_unlock_irqrestore(host->host_lock, flags); > + lrbp = &hba->lrb[tag]; > + for (poll_cnt = 100; poll_cnt; poll_cnt--) { > + err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag, > + UFS_QUERY_TASK, &resp); > + if (!err && resp == UPIU_TASK_MANAGEMENT_FUNC_SUCCEEDED) { > + /* cmd pending in the device */ > + break; > + } else if (!err && resp == UPIU_TASK_MANAGEMENT_FUNC_COMPL) { > + u32 reg; > + > + /* > + * cmd not pending in the device, check if it is > + * in transition. > + */ > + reg = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL); > + if (reg & (1 << tag)) { > + /* sleep for max. 2ms to stabilize */ > + usleep_range(1000, 2000); > + continue; > + } > + /* command completed already */ > + goto out; > + } else { > + if (!err) > + err = resp; /* service response error */ > + goto out; > + } > + } > + > + if (!poll_cnt) { > + err = -EBUSY; > goto out; > } > - spin_unlock_irqrestore(host->host_lock, flags); > > - lrbp = &hba->lrb[tag]; > err = ufshcd_issue_tm_cmd(hba, lrbp->lun, lrbp->task_tag, > UFS_ABORT_TASK, &resp); > if (err || resp != UPIU_TASK_MANAGEMENT_FUNC_COMPL) { > - err = FAILED; > + if (!err) > + err = resp; /* service response error */ > goto out; > - } else { > - err = SUCCESS; > } > > + err = ufshcd_clear_cmd(hba, tag); > + if (err) > + goto out; > + > scsi_dma_unmap(cmd); > > spin_lock_irqsave(host->host_lock, flags); > - > - /* clear the respective UTRLCLR register bit */ > - ufshcd_utrl_clear(hba, tag); > - > __clear_bit(tag, &hba->outstanding_reqs); > hba->lrb[tag].cmd =
Re: [PATCH V5 1/4] scsi: ufs: Fix broken task management command implementation
Tested-by: Dolev Raviv > Currently, sending Task Management (TM) command to the card might > be broken in some scenarios as listed below: > > Problem: If there are more than 8 TM commands the implementation > returns error to the caller. > Fix: Wait for one of the slots to be emptied and send the command. > > Problem: Sometimes it is necessary for the caller to know the TM service > response code to determine the task status. > Fix: Propogate the service response to the caller. > > Problem: If the TM command times out no proper error recovery is > implemented. > Fix: Clear the command in the controller door-bell register, so that > further commands for the same slot don't fail. > > Problem: While preparing the TM command descriptor, the task tag used > should be unique across SCSI/NOP/QUERY/TM commands and not the >task tag of the command which the TM command is trying to manage. > Fix: Use a unique task tag instead of task tag of SCSI command. > > Problem: Since the TM command involves H/W communication, abruptly ending > the request on kill interrupt signal might cause h/w malfunction. > Fix: Wait for hardware completion interrupt with TASK_UNINTERRUPTIBLE > set. > > Signed-off-by: Sujit Reddy Thumma > --- > drivers/scsi/ufs/ufshcd.c | 173 > ++--- > drivers/scsi/ufs/ufshcd.h |8 ++- > 2 files changed, 122 insertions(+), 59 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index b36ca9a..d7f3746 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -53,6 +53,9 @@ > /* Query request timeout */ > #define QUERY_REQ_TIMEOUT 30 /* msec */ > > +/* Task management command timeout */ > +#define TM_CMD_TIMEOUT 100 /* msecs */ > + > /* Expose the flag value from utp_upiu_query.value */ > #define MASK_QUERY_UPIU_FLAG_LOC 0xFF > > @@ -183,13 +186,35 @@ ufshcd_get_tmr_ocs(struct utp_task_req_desc > *task_req_descp) > /** > * ufshcd_get_tm_free_slot - get a free slot for task management request > * @hba: per adapter instance > + * @free_slot: pointer to variable with available slot value > * > - * Returns maximum number of task management request slots in case of > - * task management queue full or returns the free slot number > + * Get a free tag and lock it until ufshcd_put_tm_slot() is called. > + * Returns 0 if free slot is not available, else return 1 with tag value > + * in @free_slot. > */ > -static inline int ufshcd_get_tm_free_slot(struct ufs_hba *hba) > +static bool ufshcd_get_tm_free_slot(struct ufs_hba *hba, int *free_slot) > { > - return find_first_zero_bit(&hba->outstanding_tasks, hba->nutmrs); > + int tag; > + bool ret = false; > + > + if (!free_slot) > + goto out; > + > + do { > + tag = find_first_zero_bit(&hba->tm_slots_in_use, hba->nutmrs); > + if (tag >= hba->nutmrs) > + goto out; > + } while (test_and_set_bit_lock(tag, &hba->tm_slots_in_use)); > + > + *free_slot = tag; > + ret = true; > +out: > + return ret; > +} > + > +static inline void ufshcd_put_tm_slot(struct ufs_hba *hba, int slot) > +{ > + clear_bit_unlock(slot, &hba->tm_slots_in_use); > } > > /** > @@ -1700,10 +1725,11 @@ static void ufshcd_slave_destroy(struct > scsi_device *sdev) > * ufshcd_task_req_compl - handle task management request completion > * @hba: per adapter instance > * @index: index of the completed request > + * @resp: task management service response > * > - * Returns SUCCESS/FAILED > + * Returns non-zero value on error, zero on success > */ > -static int ufshcd_task_req_compl(struct ufs_hba *hba, u32 index) > +static int ufshcd_task_req_compl(struct ufs_hba *hba, u32 index, u8 > *resp) > { > struct utp_task_req_desc *task_req_descp; > struct utp_upiu_task_rsp *task_rsp_upiup; > @@ -1724,19 +1750,15 @@ static int ufshcd_task_req_compl(struct ufs_hba > *hba, u32 index) > task_req_descp[index].task_rsp_upiu; > task_result = be32_to_cpu(task_rsp_upiup->header.dword_1); > task_result = ((task_result & MASK_TASK_RESPONSE) >> 8); > - > - if (task_result != UPIU_TASK_MANAGEMENT_FUNC_COMPL && > - task_result != UPIU_TASK_MANAGEMENT_FUNC_SUCCEEDED) > - task_result = FAILED; > - else > - task_result = SUCCESS; > + if (resp) > + *resp = (u8)task_result; > } else { > - task_result = FAILED; > - dev_err(hba->dev, > - "trc: Invalid ocs = %x\n", ocs_value); > + dev_err(hba->dev, "%s: failed, ocs = 0x%x\n", > + __func__, ocs_value); > } > spin_unlock_irqrestore(hba->host->host_lock, flags); > - return task_result; > + > + retur
Re: [PATCH V5 3/4] scsi: ufs: Fix device and host reset methods
Tested-by: Dolev Raviv > As of now SCSI initiated error handling is broken because, > the reset APIs don't try to bring back the device initialized and > ready for further transfers. > > In case of timeouts, the scsi error handler takes care of handling aborts > and resets. Improve the error handling in such scenario by resetting the > device and host and re-initializing them in proper manner. > > Signed-off-by: Sujit Reddy Thumma > --- > drivers/scsi/ufs/ufshcd.c | 240 > +++-- > drivers/scsi/ufs/ufshcd.h |2 + > 2 files changed, 189 insertions(+), 53 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index d4ee48d..c577e6e 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -69,9 +69,14 @@ enum { > > /* UFSHCD states */ > enum { > - UFSHCD_STATE_OPERATIONAL, > UFSHCD_STATE_RESET, > UFSHCD_STATE_ERROR, > + UFSHCD_STATE_OPERATIONAL, > +}; > + > +/* UFSHCD error handling flags */ > +enum { > + UFSHCD_EH_IN_PROGRESS = (1 << 0), > }; > > /* Interrupt configuration options */ > @@ -87,6 +92,16 @@ enum { > INT_AGGR_CONFIG, > }; > > +#define ufshcd_set_eh_in_progress(h) \ > + (h->eh_flags |= UFSHCD_EH_IN_PROGRESS) > +#define ufshcd_eh_in_progress(h) \ > + (h->eh_flags & UFSHCD_EH_IN_PROGRESS) > +#define ufshcd_clear_eh_in_progress(h) \ > + (h->eh_flags &= ~UFSHCD_EH_IN_PROGRESS) > + > +static void ufshcd_tmc_handler(struct ufs_hba *hba); > +static void ufshcd_async_scan(void *data, async_cookie_t cookie); > + > /* > * ufshcd_wait_for_register - wait for register value to change > * @hba - per-adapter interface > @@ -840,10 +855,25 @@ static int ufshcd_queuecommand(struct Scsi_Host > *host, struct scsi_cmnd *cmd) > > tag = cmd->request->tag; > > - if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) { > + spin_lock_irqsave(hba->host->host_lock, flags); > + switch (hba->ufshcd_state) { > + case UFSHCD_STATE_OPERATIONAL: > + break; > + case UFSHCD_STATE_RESET: > err = SCSI_MLQUEUE_HOST_BUSY; > - goto out; > + goto out_unlock; > + case UFSHCD_STATE_ERROR: > + set_host_byte(cmd, DID_ERROR); > + cmd->scsi_done(cmd); > + goto out_unlock; > + default: > + dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n", > + __func__, hba->ufshcd_state); > + set_host_byte(cmd, DID_BAD_TARGET); > + cmd->scsi_done(cmd); > + goto out_unlock; > } > + spin_unlock_irqrestore(hba->host->host_lock, flags); > > /* acquire the tag to make sure device cmds don't use it */ > if (test_and_set_bit_lock(tag, &hba->lrb_in_use)) { > @@ -880,6 +910,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, > struct scsi_cmnd *cmd) > /* issue command to the controller */ > spin_lock_irqsave(hba->host->host_lock, flags); > ufshcd_send_command(hba, tag); > +out_unlock: > spin_unlock_irqrestore(hba->host->host_lock, flags); > out: > return err; > @@ -1495,8 +1526,6 @@ static int ufshcd_make_hba_operational(struct > ufs_hba *hba) > if (hba->ufshcd_state == UFSHCD_STATE_RESET) > scsi_unblock_requests(hba->host); > > - hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; > - > out: > return err; > } > @@ -2245,8 +2274,12 @@ static void ufshcd_err_handler(struct ufs_hba *hba) > } > return; > fatal_eh: > - hba->ufshcd_state = UFSHCD_STATE_ERROR; > - schedule_work(&hba->feh_workq); > + /* handle fatal errors only when link is functional */ > + if (hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) { > + /* block commands at driver layer until error is handled */ > + hba->ufshcd_state = UFSHCD_STATE_ERROR; > + schedule_work(&hba->feh_workq); > + } > } > > /** > @@ -2411,12 +2444,13 @@ static int ufshcd_issue_tm_cmd(struct ufs_hba > *hba, int lun_id, int task_id, > } > > /** > - * ufshcd_device_reset - reset device and abort all the pending commands > + * ufshcd_eh_device_reset_handler - device reset handler registered to > + *scsi layer. > * @cmd: SCSI command pointer > * > * Returns SUCCESS/FAILED > */ > -static int ufshcd_device_reset(struct scsi_cmnd *cmd) > +static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd) > { > struct Scsi_Host *host; > struct ufs_hba *hba; > @@ -2425,6 +2459,7 @@ static int ufshcd_device_reset(struct scsi_cmnd > *cmd) > int err; > u8 resp = 0xF; > struct ufshcd_lrb *lrbp; > + unsigned long flags; > > host = cmd->device->host; > hba = shost_priv(host); > @@ -2433,55 +2468,33 @@ static int ufshcd_device_reset(struct scsi_cmnd > *cmd) > lrbp = &hba->lrb[tag]; > err = ufshcd_issue_tm_cmd(hba, lrbp->lun, 0, UFS_LOGICAL_RESET, &resp
Re: [PATCH V5 4/4] scsi: ufs: Improve UFS fatal error handling
Tested-by: Dolev Raviv > Error handling in UFS driver is broken and resets the host controller > for fatal errors without re-initialization. Correct the fatal error > handling sequence according to UFS Host Controller Interface (HCI) > v1.1 specification. > > o Processed requests which are completed w/wo error are reported to > SCSI layer and any pending commands that are not started are aborted > in the controller and re-queued into scsi mid-layer queue. > > o Upon determining fatal error condition the host controller may hang > forever until a reset is applied. Block SCSI layer for sending new > requests and apply reset in a separate error handling work. > > o SCSI is informed about the expected Unit-Attention exception from the > device for the immediate command after a reset so that the SCSI layer > take necessary steps to establish communication with the device. > > Signed-off-by: Sujit Reddy Thumma > --- > drivers/scsi/ufs/ufshcd.c | 229 > - > drivers/scsi/ufs/ufshcd.h | 10 ++- > 2 files changed, 149 insertions(+), 90 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index c577e6e..4dca9b4 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -79,6 +79,14 @@ enum { > UFSHCD_EH_IN_PROGRESS = (1 << 0), > }; > > +/* UFSHCD UIC layer error flags */ > +enum { > + UFSHCD_UIC_DL_PA_INIT_ERROR = (1 << 0), /* Data link layer error */ > + UFSHCD_UIC_NL_ERROR = (1 << 1), /* Network layer error */ > + UFSHCD_UIC_TL_ERROR = (1 << 2), /* Transport Layer error */ > + UFSHCD_UIC_DME_ERROR = (1 << 3), /* DME error */ > +}; > + > /* Interrupt configuration options */ > enum { > UFSHCD_INT_DISABLE, > @@ -101,6 +109,8 @@ enum { > > static void ufshcd_tmc_handler(struct ufs_hba *hba); > static void ufshcd_async_scan(void *data, async_cookie_t cookie); > +static int ufshcd_reset_and_restore(struct ufs_hba *hba); > +static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag); > > /* > * ufshcd_wait_for_register - wait for register value to change > @@ -1523,9 +1533,6 @@ static int ufshcd_make_hba_operational(struct > ufs_hba *hba) > goto out; > } > > - if (hba->ufshcd_state == UFSHCD_STATE_RESET) > - scsi_unblock_requests(hba->host); > - > out: > return err; > } > @@ -1651,66 +1658,6 @@ static int ufshcd_verify_dev_init(struct ufs_hba > *hba) > } > > /** > - * ufshcd_do_reset - reset the host controller > - * @hba: per adapter instance > - * > - * Returns SUCCESS/FAILED > - */ > -static int ufshcd_do_reset(struct ufs_hba *hba) > -{ > - struct ufshcd_lrb *lrbp; > - unsigned long flags; > - int tag; > - > - /* block commands from midlayer */ > - scsi_block_requests(hba->host); > - > - spin_lock_irqsave(hba->host->host_lock, flags); > - hba->ufshcd_state = UFSHCD_STATE_RESET; > - > - /* send controller to reset state */ > - ufshcd_hba_stop(hba); > - spin_unlock_irqrestore(hba->host->host_lock, flags); > - > - /* abort outstanding commands */ > - for (tag = 0; tag < hba->nutrs; tag++) { > - if (test_bit(tag, &hba->outstanding_reqs)) { > - lrbp = &hba->lrb[tag]; > - if (lrbp->cmd) { > - scsi_dma_unmap(lrbp->cmd); > - lrbp->cmd->result = DID_RESET << 16; > - lrbp->cmd->scsi_done(lrbp->cmd); > - lrbp->cmd = NULL; > - clear_bit_unlock(tag, &hba->lrb_in_use); > - } > - } > - } > - > - /* complete device management command */ > - if (hba->dev_cmd.complete) > - complete(hba->dev_cmd.complete); > - > - /* clear outstanding request/task bit maps */ > - hba->outstanding_reqs = 0; > - hba->outstanding_tasks = 0; > - > - /* Host controller enable */ > - if (ufshcd_hba_enable(hba)) { > - dev_err(hba->dev, > - "Reset: Controller initialization failed\n"); > - return FAILED; > - } > - > - if (ufshcd_link_startup(hba)) { > - dev_err(hba->dev, > - "Reset: Link start-up failed\n"); > - return FAILED; > - } > - > - return SUCCESS; > -} > - > -/** > * ufshcd_slave_alloc - handle initial SCSI device configurations > * @sdev: pointer to SCSI device > * > @@ -1727,6 +1674,9 @@ static int ufshcd_slave_alloc(struct scsi_device > *sdev) > sdev->use_10_for_ms = 1; > scsi_set_tag_type(sdev, MSG_SIMPLE_TAG); > > + /* allow SCSI layer to restart the device in case of errors */ > + sdev->allow_restart = 1; > + > /* >* Inform SCSI Midlayer that the LUN queue depth is same as the >* controller queue depth. If a LUN queue depth is less than the > @@ -1930,6 +1880,9 @@ ufshcd_transfer_rsp_status(
Re: [PATCH] hpsa: fix warning with smp_processor_id() in preemptible
* John Kacur | 2013-07-26 16:42:30 [+0200]: >Signed-off-by: John Kacur >Acked-by: Stephen ping. I checked the branches for-next, scsi-fixes, fixes and misc at [0] and I didn't see it. I'm going to take this for 3.10-rt but please don't lose it on its way to Linus :) [0] git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] hpsa: fix warning with smp_processor_id() in preemptible
- Original Message - > * John Kacur | 2013-07-26 16:42:30 [+0200]: > > >Signed-off-by: John Kacur > >Acked-by: Stephen > > ping. > > I checked the branches for-next, scsi-fixes, fixes and misc at [0] and I > didn't see it. I'm going to take this for 3.10-rt but please don't lose > it on its way to Linus :) > > [0] git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git I hope it was clear to everyone that this patch was intended for upstream. It was discovered by running the real-time kernel, but it exposes a (minor) problem that should be fixed in the mainline kernel. Please apply it there Stephen, and push it upstream appropriately. Thank You. John Kacur -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] hpsa: fix warning with smp_processor_id() in preemptible
On Mon, Aug 12, 2013 at 09:33:32AM -0400, John Kacur wrote: > > > - Original Message - > > * John Kacur | 2013-07-26 16:42:30 [+0200]: > > > > >Signed-off-by: John Kacur > > >Acked-by: Stephen > > > > ping. > > > > I checked the branches for-next, scsi-fixes, fixes and misc at [0] and I > > didn't see it. I'm going to take this for 3.10-rt but please don't lose > > it on its way to Linus :) > > > > [0] git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git > > > I hope it was clear to everyone that this patch was intended for upstream. > It was discovered by running the real-time kernel, but it exposes a (minor) > problem > that should be fixed in the mainline kernel. Please apply it there Stephen, > and > push it upstream appropriately. I don't have such pushing abilities. Acking it as I've done is all I can do. Bug James. :) -- steve -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing
On Fri, Aug 09, 2013 at 11:07:37AM -0600, Jens Axboe wrote: > You don't have to resubmit, I'll get it reviewed and applied today. Hi Jens, I limited the minimal queue depth to 4, which is apparently wrong in case of libata. I will post a new series. > -- > Jens Axboe > -- Regards, Alexander Gordeev agord...@redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/4] [SCSI] sg: fix race condition in sg_open
On 08/06/2013 04:52 AM, Douglas Gilbert wrote: > On 13-08-04 10:19 PM, vaughan wrote: >> On 08/03/2013 01:25 PM, Douglas Gilbert wrote: >>> On 13-08-01 01:01 AM, Douglas Gilbert wrote: On 13-07-22 01:03 PM, Jörn Engel wrote: > On Mon, 22 July 2013 12:40:29 +0800, Vaughan Cao wrote: >> >> There is a race when open sg with O_EXCL flag. Also a race may >> happen between >> sg_open and sg_remove. >> >> Changes from v4: >>* [3/4] use ERR_PTR series instead of adding another parameter in >> sg_add_sfp >>* [4/4] fix conflict for cherry-pick from v3. >> >> Changes from v3: >>* release o_sem in sg_release(), not in sg_remove_sfp(). >>* not set exclude with sfd_lock held. >> >> Vaughan Cao (4): >> [SCSI] sg: use rwsem to solve race during exclusive open >> [SCSI] sg: no need sg_open_exclusive_lock >> [SCSI] sg: checking sdp->detached isn't protected when open >> [SCSI] sg: push file descriptor list locking down to per-device >> locking >> >>drivers/scsi/sg.c | 178 >> +- >>1 file changed, 83 insertions(+), 95 deletions(-) > > Patchset looks good to me, although I didn't test it on hardware yet. > Signed-off-by: Joern Engel > > James, care to pick this up? Acked-by: Douglas Gilbert Tested O_EXCL with multiple processes and threads; passed. sg driver prior to this patch had "leaky" O_EXCL logic according to the same test. Block device passed. James, could you clean this up: drivers/scsi/sg.c:242:6: warning: unused variable ‘res’ [-Wunused-variable] >>> >>> Further testing suggests this patch on the sg driver is >>> broken, so I'll rescind my ack. >>> >>> The case it is broken for is when a device is opened >>> without O_EXCL. Now if, while it is open, a second >>> thread/process tries to open the same device O_EXCL >>> then IMO the second open should fail with EBUSY. >>> >>> My testing shows that O_EXCL opens properly deflect >>> other O_EXCL opens. >> Hi Doug, >> >> My test don't have this issue. The routine is something as below: >> >> I start three opens without O_EXCL, wait 30s each, and open with >> O_EXCL|O_NONBLOCK, it failed with EBUSY. >> And I also call myopen with/without O_EXCL many times in background at >> the same time, and the test is passed. I don't know why it failed in >> your test. >> >> Usage: myopen [-e][-n][-d delay] -f file >>-e: exclude >>-n: nonblock >>-d: delay N seconds and then close. >> >> [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & >> [1] 3417 >> [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & >> [2] 3418 >> [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & >> [3] 3419 >> [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug >> max_active_device=6(origin 1) >> def_reserved_size=32768 >> >>> device=sg5 scsi5 chan=0 id=1 lun=0 em=0 sg_tablesize=55 excl=0 >> FD(1): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 >> cmd_q=0 f_packid=0 k_orphan=0 closed=0 >> No requests active >> FD(2): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 >> cmd_q=0 f_packid=0 k_orphan=0 closed=0 >> No requests active >> FD(3): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 >> cmd_q=0 f_packid=0 k_orphan=0 closed=0 >> No requests active >> >> [root@vacaowol5 16835013]# ./myopen -e -n -f /dev/sg5 -d 30 & >> [4] 3422 >> [3422:3351] /dev/sg5:exclude: Device or resource busy >> >> [4]+ Exit 1 ./myopen -e -n -f /dev/sg5 -d 30 >> >> [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug >> max_active_device=6(origin 1) >> def_reserved_size=32768 >> >>> device=sg5 scsi5 chan=0 id=1 lun=0 em=0 sg_tablesize=55 excl=0 >> FD(1): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 >> cmd_q=0 f_packid=0 k_orphan=0 closed=0 >> No requests active >> FD(2): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 >> cmd_q=0 f_packid=0 k_orphan=0 closed=0 >> No requests active >> FD(3): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 >> cmd_q=0 f_packid=0 k_orphan=0 closed=0 >> No requests active >> [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug >> [1] Done./myopen -f /dev/sg5 -d 30 >> [2]- Done./myopen -f /dev/sg5 -d 30 >> [3]+ Done./myopen -f /dev/sg5 -d 30 >> > > Hi, > After the initial failures about 36 hours ago, retesting > yesterday and today has not produced any unexpected > failures. And I have been trying hard on lk 3.10.4 and > lk 3.10.5 . > > My test program is a bit more intense than yours and can > be found in the sg3_utils beta in the News section of this > page: > http://sg.danny.cz/sg/ > > It is in the examples directory, two variants called > sg_tst_excl and sg_tst_excl2 . You will need a recent gcc >
Re: [PATCH v5 0/4] [SCSI] sg: fix race condition in sg_open
On 13-08-12 10:46 PM, vaughan wrote: On 08/06/2013 04:52 AM, Douglas Gilbert wrote: On 13-08-04 10:19 PM, vaughan wrote: On 08/03/2013 01:25 PM, Douglas Gilbert wrote: On 13-08-01 01:01 AM, Douglas Gilbert wrote: On 13-07-22 01:03 PM, Jörn Engel wrote: On Mon, 22 July 2013 12:40:29 +0800, Vaughan Cao wrote: There is a race when open sg with O_EXCL flag. Also a race may happen between sg_open and sg_remove. Changes from v4: * [3/4] use ERR_PTR series instead of adding another parameter in sg_add_sfp * [4/4] fix conflict for cherry-pick from v3. Changes from v3: * release o_sem in sg_release(), not in sg_remove_sfp(). * not set exclude with sfd_lock held. Vaughan Cao (4): [SCSI] sg: use rwsem to solve race during exclusive open [SCSI] sg: no need sg_open_exclusive_lock [SCSI] sg: checking sdp->detached isn't protected when open [SCSI] sg: push file descriptor list locking down to per-device locking drivers/scsi/sg.c | 178 +- 1 file changed, 83 insertions(+), 95 deletions(-) Patchset looks good to me, although I didn't test it on hardware yet. Signed-off-by: Joern Engel James, care to pick this up? Acked-by: Douglas Gilbert Tested O_EXCL with multiple processes and threads; passed. sg driver prior to this patch had "leaky" O_EXCL logic according to the same test. Block device passed. James, could you clean this up: drivers/scsi/sg.c:242:6: warning: unused variable ‘res’ [-Wunused-variable] Further testing suggests this patch on the sg driver is broken, so I'll rescind my ack. The case it is broken for is when a device is opened without O_EXCL. Now if, while it is open, a second thread/process tries to open the same device O_EXCL then IMO the second open should fail with EBUSY. My testing shows that O_EXCL opens properly deflect other O_EXCL opens. Hi Doug, My test don't have this issue. The routine is something as below: I start three opens without O_EXCL, wait 30s each, and open with O_EXCL|O_NONBLOCK, it failed with EBUSY. And I also call myopen with/without O_EXCL many times in background at the same time, and the test is passed. I don't know why it failed in your test. Usage: myopen [-e][-n][-d delay] -f file -e: exclude -n: nonblock -d: delay N seconds and then close. [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & [1] 3417 [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & [2] 3418 [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & [3] 3419 [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug max_active_device=6(origin 1) def_reserved_size=32768 >>> device=sg5 scsi5 chan=0 id=1 lun=0 em=0 sg_tablesize=55 excl=0 FD(1): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(3): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active [root@vacaowol5 16835013]# ./myopen -e -n -f /dev/sg5 -d 30 & [4] 3422 [3422:3351] /dev/sg5:exclude: Device or resource busy [4]+ Exit 1 ./myopen -e -n -f /dev/sg5 -d 30 [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug max_active_device=6(origin 1) def_reserved_size=32768 >>> device=sg5 scsi5 chan=0 id=1 lun=0 em=0 sg_tablesize=55 excl=0 FD(1): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(2): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active FD(3): timeout=6ms bufflen=32768 (res)sgat=1 low_dma=0 cmd_q=0 f_packid=0 k_orphan=0 closed=0 No requests active [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug [1] Done./myopen -f /dev/sg5 -d 30 [2]- Done./myopen -f /dev/sg5 -d 30 [3]+ Done./myopen -f /dev/sg5 -d 30 Hi, After the initial failures about 36 hours ago, retesting yesterday and today has not produced any unexpected failures. And I have been trying hard on lk 3.10.4 and lk 3.10.5 . My test program is a bit more intense than yours and can be found in the sg3_utils beta in the News section of this page: http://sg.danny.cz/sg/ It is in the examples directory, two variants called sg_tst_excl and sg_tst_excl2 . You will need a recent gcc compiler, IOW something that can compile c++11 . gcc 4.7.3 in Ubuntu 13.04 only just manages, fedora 19 should do better with gcc 4.8.1 . The threading is implemented using pthreads so it should be reliable. Typically I run multiple instances (processes) and each has multiple threads. One instance can run '-x' which will cause its first thread not to use O_EXCL **. All my tests currently use O_NONBLOCK
Re: [PATCH 4/7] scsi: ufs: add dme configuration primitives
On 7/30/2013 6:32 PM, Seungwon Jeon wrote: On Mon, July 29, 2013, Subhash Jadavani wrote: On 7/26/2013 7:17 PM, Seungwon Jeon wrote: Implements to support GET and SET operations of the DME. These operations are used to configure the behavior of the UNIPRO. Along with basic operation, {Peer/AttrSetType} can be mixed. Signed-off-by: Seungwon Jeon --- drivers/scsi/ufs/ufshcd.c | 88 + drivers/scsi/ufs/ufshcd.h | 51 ++ drivers/scsi/ufs/ufshci.h |6 +++ 3 files changed, 145 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 7152ec4..8277c40 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -188,6 +188,18 @@ static inline int ufshcd_get_uic_cmd_result(struct ufs_hba *hba) } /** + * ufshcd_get_dme_attr_val - Get the value of attribute returned by UIC command + * @hba: Pointer to adapter instance + * + * This function gets UIC command argument3 + * Returns 0 on success, non zero value on error + */ +static inline u32 ufshcd_get_dme_attr_val(struct ufs_hba *hba) +{ + return ufshcd_readl(hba, REG_UIC_COMMAND_ARG_3); +} + +/** * ufshcd_is_valid_req_rsp - checks if controller TR response is valid * @ucd_rsp_ptr: pointer to response UPIU * @@ -821,6 +833,80 @@ static int ufshcd_dme_link_startup(struct ufs_hba *hba) } /** + * ufshcd_dme_set_attr - UIC command for DME_SET, DME_PEER_SET + * @hba: per adapter instance + * @attr_sel: uic command argument1 + * @attr_set: attribute set type as uic command argument2 + * @mib_val: setting value as uic command argument3 + * @peer: indicate wherter peer or non-peer typo: s/wtherter/whether This would sound better: s/non-peer/local Ok. thanks. Do above for ufshcd_dme_get_attr() function as well. + * + * Returns 0 on success, non-zero value on failure + */ +int ufshcd_dme_set_attr(struct ufs_hba *hba, u32 attr_sel, + u8 attr_set, u32 mib_val, u8 peer) +{ + struct uic_command uic_cmd = {0}; + static const char *const action[] = { + "dme-set", + "dme-peer-set" + }; + const char *set = action[!!peer]; + int ret; + + uic_cmd.command = peer ? + UIC_CMD_DME_PEER_SET : UIC_CMD_DME_SET; + uic_cmd.argument1 = attr_sel; + uic_cmd.argument2 = UIC_ARG_ATTR_TYPE(attr_set); + uic_cmd.argument3 = mib_val; + + ret = ufshcd_send_uic_cmd(hba, &uic_cmd); + if (ret) + dev_err(hba->dev, "%s: attr-id 0x%x error code %d\n", + set, UIC_GET_ATTR_ID(attr_sel), ret); Its also good to print the "mib_val" which we were trying to set. It might be possible that DME_SET failed because we are trying to set out of range value to MIB attribute. I'll add it. thanks + + return ret; +} +EXPORT_SYMBOL_GPL(ufshcd_dme_set_attr); + +/** + * ufshcd_dme_get_attr - UIC command for DME_GET, DME_PEER_GET + * @hba: per adapter instance + * @attr_sel: uic command argument1 + * @mib_val: the value of the attribute as returned by the UIC command + * @peer: indicate wherter peer or non-peer + * + * Returns 0 on success, non-zero value on failure + */ +int ufshcd_dme_get_attr(struct ufs_hba *hba, u32 attr_sel, + u32 *mib_val, u8 peer) +{ + struct uic_command uic_cmd = {0}; + static const char *const action[] = { + "dme-get", + "dme-peer-get" + }; + const char *get = action[!!peer]; + int ret; + + uic_cmd.command = peer ? + UIC_CMD_DME_PEER_GET : UIC_CMD_DME_GET; + uic_cmd.argument1 = attr_sel; + + ret = ufshcd_send_uic_cmd(hba, &uic_cmd); + if (ret) { + dev_err(hba->dev, "%s: attr-id 0x%x error code %d\n", + get, UIC_GET_ATTR_ID(attr_sel), ret); + goto out; + } + + if (mib_val) + *mib_val = uic_cmd.argument3; +out: + return ret; +} +EXPORT_SYMBOL_GPL(ufshcd_dme_get_attr); + +/** * ufshcd_make_hba_operational - Make UFS controller operational * @hba: per adapter instance * @@ -1256,6 +1342,8 @@ static void ufshcd_uic_cmd_compl(struct ufs_hba *hba) if (hba->active_uic_cmd) { hba->active_uic_cmd->argument2 |= ufshcd_get_uic_cmd_result(hba); + hba->active_uic_cmd->argument3 = + ufshcd_get_dme_attr_val(hba); We can optimize here by reading the dme_attribute value only if active_uic_cmd->command is set to DME_GET or DME_PEER_GET. For all other DME commands, meaning of this is "reserved" for read. Right, you have a point. But when considering the access of 'hba->active_uic_cmd->command" every time and additional 'branch statement', it looks like no difference in terms of the optimization. Reading the dme_attribute value could be trivial thing. I would consider the pe