Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 08:46:12AM +0200, Paolo Bonzini wrote: > Il 04/09/2012 04:21, Nicholas A. Bellinger ha scritto: > >> @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi > >> *vscsi, void *buf) > >>struct virtio_scsi_cmd *cmd = buf; > >>struct scsi_cmnd *sc = cmd->sc; > >>struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd; > >> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > >> + > >> + atomic_dec(&tgt->reqs); > >> > > > > As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the > > atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o > > smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h > > accessors properly, no..? > > No, only a single "thing" is being accessed, and there is no need to > order the decrement with respect to preceding or subsequent accesses to > other locations. > > In other words, tgt->reqs is already synchronized with itself, and that > is enough. I think your logic is correct and barrier is not needed, but this needs better documentation. > (Besides, on x86 smp_mb__after_atomic_dec is a nop). > >> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, > >> + struct scsi_cmnd *sc) > >> +{ > >> + struct virtio_scsi *vscsi = shost_priv(sh); > >> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > >> + unsigned long flags; > >> + u32 queue_num; > >> + > >> + /* Using an atomic_t for tgt->reqs lets the virtqueue handler > >> + * decrement it without taking the spinlock. > >> + */ Above comment is not really helpful - reader can be safely assumed to know what atomic_t is. Please delete, and replace with the text from commit log that explains the heuristic used to select req_vq. Also please add a comment near 'reqs' definition. Something like "number of outstanding requests - used to detect idle target". > >> + spin_lock_irqsave(&tgt->tgt_lock, flags); Looks like this lock can be removed - req_vq is only modified when target is idle and only used when it is not idle. > >> + if (atomic_inc_return(&tgt->reqs) == 1) { > >> + queue_num = smp_processor_id(); > >> + while (unlikely(queue_num >= vscsi->num_queues)) > >> + queue_num -= vscsi->num_queues; > >> + tgt->req_vq = &vscsi->req_vqs[queue_num]; > >> + } > >> + spin_unlock_irqrestore(&tgt->tgt_lock, flags); > >> + return virtscsi_queuecommand(vscsi, tgt, sc); > >> +} > >> + > >> + . > >> +static int virtscsi_queuecommand_single(struct Scsi_Host *sh, > >> + struct scsi_cmnd *sc) > >> +{ > >> + struct virtio_scsi *vscsi = shost_priv(sh); > >> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > >> + > >> + atomic_inc(&tgt->reqs); > >> + return virtscsi_queuecommand(vscsi, tgt, sc); > >> +} > >> + Here, reqs is unused - why bother incrementing it? A branch on completion would be cheaper IMHO. >virtio-scsi multiqueue has a performance benefit up to 20% To be fair, you could be running in single queue mode. In that case extra atomics and indirection that this code brings will just add overhead without benefits. I don't know how significant would that be. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 10:46, Michael S. Tsirkin ha scritto: +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, + struct scsi_cmnd *sc) +{ + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; + unsigned long flags; + u32 queue_num; + + /* Using an atomic_t for tgt->reqs lets the virtqueue handler + * decrement it without taking the spinlock. + */ > > Above comment is not really helpful - reader can be safely assumed to > know what atomic_t is. Sure, the comment explains that we use an atomic because _elsewhere_ the tgt_lock is not held while modifying reqs. > Please delete, and replace with the text from commit log > that explains the heuristic used to select req_vq. Ok. > Also please add a comment near 'reqs' definition. > Something like "number of outstanding requests - used to detect idle > target". Ok. > + spin_lock_irqsave(&tgt->tgt_lock, flags); > > Looks like this lock can be removed - req_vq is only > modified when target is idle and only used when it is > not idle. If you have two incoming requests at the same time, req_vq is also modified when the target is not idle; that's the point of the lock. Suppose tgt->reqs = 0 initially, and you have two processors/queues. Initially tgt->req_vq is queue #1. If you have this: queuecommand on CPU #0 queuecommand #2 on CPU #1 -- atomic_inc_return(...) == 1 atomic_inc_return(...) == 2 virtscsi_queuecommand to queue #1 tgt->req_vq = queue #0 virtscsi_queuecommand to queue #0 then two requests are issued to different queues without a quiescent point in the middle. + if (atomic_inc_return(&tgt->reqs) == 1) { + queue_num = smp_processor_id(); + while (unlikely(queue_num >= vscsi->num_queues)) + queue_num -= vscsi->num_queues; + tgt->req_vq = &vscsi->req_vqs[queue_num]; + } + spin_unlock_irqrestore(&tgt->tgt_lock, flags); + return virtscsi_queuecommand(vscsi, tgt, sc); +} + + > > . > +static int virtscsi_queuecommand_single(struct Scsi_Host *sh, + struct scsi_cmnd *sc) +{ + struct virtio_scsi *vscsi = shost_priv(sh); + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; + + atomic_inc(&tgt->reqs); + return virtscsi_queuecommand(vscsi, tgt, sc); +} + > > Here, reqs is unused - why bother incrementing it? > A branch on completion would be cheaper IMHO. Well, I could also let tgt->reqs go negative, but it would be a bit untidy. Another alternative is to access the target's target_busy field with ACCESS_ONCE, and drop reqs altogether. Too tricky to do this kind of micro-optimization so early, though. >> virtio-scsi multiqueue has a performance benefit up to 20% > > To be fair, you could be running in single queue mode. > In that case extra atomics and indirection that this code > brings will just add overhead without benefits. > I don't know how significant would that be. Not measurable in my experiments. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42969] Softlockup during boot during init_sd
https://bugzilla.kernel.org/show_bug.cgi?id=42969 Alan changed: What|Removed |Added CC||a...@lxorguk.ukuu.org.uk Kernel Version|3.3-rc5 |3.4-rc6 --- Comment #9 from Alan 2012-09-04 11:04:22 --- Bugzilla is mostly used for tracking reports. It's a better idea to discuss it on linux-scsi@vger.kernel.org if you want to be involved in actively fixing it. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 12:25:03PM +0200, Paolo Bonzini wrote: > Il 04/09/2012 10:46, Michael S. Tsirkin ha scritto: > +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, > + struct scsi_cmnd *sc) > +{ > +struct virtio_scsi *vscsi = shost_priv(sh); > +struct virtio_scsi_target_state *tgt = > vscsi->tgt[sc->device->id]; > +unsigned long flags; > +u32 queue_num; > + > +/* Using an atomic_t for tgt->reqs lets the virtqueue handler > + * decrement it without taking the spinlock. > + */ > > > > Above comment is not really helpful - reader can be safely assumed to > > know what atomic_t is. > > Sure, the comment explains that we use an atomic because _elsewhere_ the > tgt_lock is not held while modifying reqs. > > > Please delete, and replace with the text from commit log > > that explains the heuristic used to select req_vq. > > Ok. > > > Also please add a comment near 'reqs' definition. > > Something like "number of outstanding requests - used to detect idle > > target". > > Ok. > > > > +spin_lock_irqsave(&tgt->tgt_lock, flags); > > > > Looks like this lock can be removed - req_vq is only > > modified when target is idle and only used when it is > > not idle. > > If you have two incoming requests at the same time, req_vq is also > modified when the target is not idle; that's the point of the lock. > > Suppose tgt->reqs = 0 initially, and you have two processors/queues. > Initially tgt->req_vq is queue #1. If you have this: > > queuecommand on CPU #0 queuecommand #2 on CPU #1 > -- > atomic_inc_return(...) == 1 >atomic_inc_return(...) == 2 >virtscsi_queuecommand to queue #1 > tgt->req_vq = queue #0 > virtscsi_queuecommand to queue #0 > > then two requests are issued to different queues without a quiescent > point in the middle. What happens then? Does this break correctness? > +if (atomic_inc_return(&tgt->reqs) == 1) { > +queue_num = smp_processor_id(); > +while (unlikely(queue_num >= vscsi->num_queues)) > +queue_num -= vscsi->num_queues; > +tgt->req_vq = &vscsi->req_vqs[queue_num]; > +} > +spin_unlock_irqrestore(&tgt->tgt_lock, flags); > +return virtscsi_queuecommand(vscsi, tgt, sc); > +} > + > + > > > > . > > > +static int virtscsi_queuecommand_single(struct Scsi_Host *sh, > + struct scsi_cmnd *sc) > +{ > + struct virtio_scsi *vscsi = shost_priv(sh); > + struct virtio_scsi_target_state *tgt = > vscsi->tgt[sc->device->id]; > + > + atomic_inc(&tgt->reqs); > + return virtscsi_queuecommand(vscsi, tgt, sc); > +} > + > > > > Here, reqs is unused - why bother incrementing it? > > A branch on completion would be cheaper IMHO. > > Well, I could also let tgt->reqs go negative, but it would be a bit untidy. > > Another alternative is to access the target's target_busy field with > ACCESS_ONCE, and drop reqs altogether. Too tricky to do this kind of > micro-optimization so early, though. So keep it simple and just check a flag. > >> virtio-scsi multiqueue has a performance benefit up to 20% > > > > To be fair, you could be running in single queue mode. > > In that case extra atomics and indirection that this code > > brings will just add overhead without benefits. > > I don't know how significant would that be. > > Not measurable in my experiments. > > Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 13:09, Michael S. Tsirkin ha scritto: >> > queuecommand on CPU #0 queuecommand #2 on CPU #1 >> > -- >> > atomic_inc_return(...) == 1 >> >atomic_inc_return(...) == 2 >> >virtscsi_queuecommand to queue #1 >> > tgt->req_vq = queue #0 >> > virtscsi_queuecommand to queue #0 >> > >> > then two requests are issued to different queues without a quiescent >> > point in the middle. > What happens then? Does this break correctness? Yes, requests to the same target should be processed in FIFO order, or you have things like a flush issued before the write it was supposed to flush. This is why I can only change the queue when there is no request pending. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Aug 28, 2012 at 01:54:17PM +0200, Paolo Bonzini wrote: > This patch adds queue steering to virtio-scsi. When a target is sent > multiple requests, we always drive them to the same queue so that FIFO > processing order is kept. However, if a target was idle, we can choose > a queue arbitrarily. In this case the queue is chosen according to the > current VCPU, so the driver expects the number of request queues to be > equal to the number of VCPUs. This makes it easy and fast to select > the queue, and also lets the driver optimize the IRQ affinity for the > virtqueues (each virtqueue's affinity is set to the CPU that "owns" > the queue). > > Signed-off-by: Paolo Bonzini I guess an alternative is a per-target vq. Is the reason you avoid this that you expect more targets than cpus? If yes this is something you might want to mention in the log. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] zfcp: No automatic port_rescan on events
From: Steffen Maier In FC fabrics with large zones, the automatic port_rescan on incoming ELS and any adapter recovery can cause quite some traffic at the very same time, especially if lots of Linux images share an HBA, which is common on s390. This can cause trouble and failures. Fix this by making such port rescans dependent on a user configurable module parameter. The following unconditional automatic port rescans remain as is: On setting an adapter online and on manual user-triggered writes to the sysfs attribute port_rescan. Signed-off-by: Steffen Maier --- drivers/s390/scsi/zfcp_ccw.c |7 ++- drivers/s390/scsi/zfcp_erp.c |2 +- drivers/s390/scsi/zfcp_ext.h |2 ++ drivers/s390/scsi/zfcp_fc.c | 23 ++- drivers/s390/scsi/zfcp_fsf.c |2 +- 5 files changed, 32 insertions(+), 4 deletions(-) --- a/drivers/s390/scsi/zfcp_ccw.c +++ b/drivers/s390/scsi/zfcp_ccw.c @@ -57,7 +57,7 @@ static int zfcp_ccw_activate(struct ccw_ zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED, tag); zfcp_erp_wait(adapter); - flush_work(&adapter->scan_work); + flush_work(&adapter->scan_work); /* ok to call even if nothing queued */ zfcp_ccw_adapter_put(adapter); @@ -171,6 +171,11 @@ static int zfcp_ccw_set_online(struct cc adapter->req_no = 0; zfcp_ccw_activate(cdev, 0, "ccsonl1"); + /* scan for remote ports + either at the end of any successful adapter recovery + or only after the adapter recovery for setting a device online */ + zfcp_fc_inverse_conditional_port_scan(adapter); + flush_work(&adapter->scan_work); /* ok to call even if nothing queued */ zfcp_ccw_adapter_put(adapter); return 0; } --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -1230,7 +1230,7 @@ static void zfcp_erp_action_cleanup(stru case ZFCP_ERP_ACTION_REOPEN_ADAPTER: if (result == ZFCP_ERP_SUCCEEDED) { register_service_level(&adapter->service_level); - queue_work(adapter->work_queue, &adapter->scan_work); + zfcp_fc_conditional_port_scan(adapter); queue_work(adapter->work_queue, &adapter->ns_up_work); } else unregister_service_level(&adapter->service_level); --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -99,6 +99,8 @@ extern void zfcp_fc_gs_destroy(struct zf extern int zfcp_fc_exec_bsg_job(struct fc_bsg_job *); extern int zfcp_fc_timeout_bsg_job(struct fc_bsg_job *); extern void zfcp_fc_sym_name_update(struct work_struct *); +extern void zfcp_fc_conditional_port_scan(struct zfcp_adapter *); +extern void zfcp_fc_inverse_conditional_port_scan(struct zfcp_adapter *); /* zfcp_fsf.c */ extern struct kmem_cache *zfcp_fsf_qtcb_cache; --- a/drivers/s390/scsi/zfcp_fc.c +++ b/drivers/s390/scsi/zfcp_fc.c @@ -26,6 +26,27 @@ static u32 zfcp_fc_rscn_range_mask[] = { [ELS_ADDR_FMT_FAB] = 0x00, }; +static bool no_auto_port_rescan; +module_param_named(no_auto_port_rescan, no_auto_port_rescan, bool, 0600); +MODULE_PARM_DESC(no_auto_port_rescan, +"no automatic port_rescan (default off)"); + +void zfcp_fc_conditional_port_scan(struct zfcp_adapter *adapter) +{ + if (no_auto_port_rescan) + return; + + queue_work(adapter->work_queue, &adapter->scan_work); +} + +void zfcp_fc_inverse_conditional_port_scan(struct zfcp_adapter *adapter) +{ + if (!no_auto_port_rescan) + return; + + queue_work(adapter->work_queue, &adapter->scan_work); +} + /** * zfcp_fc_post_event - post event to userspace via fc_transport * @work: work struct with enqueued events @@ -206,7 +227,7 @@ static void zfcp_fc_incoming_rscn(struct zfcp_fc_enqueue_event(fsf_req->adapter, FCH_EVT_RSCN, *(u32 *)page); } - queue_work(fsf_req->adapter->work_queue, &fsf_req->adapter->scan_work); + zfcp_fc_conditional_port_scan(fsf_req->adapter); } static void zfcp_fc_incoming_wwpn(struct zfcp_fsf_req *req, u64 wwpn) --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -257,7 +257,7 @@ static void zfcp_fsf_status_read_handler if (sr_buf->status_subtype & FSF_STATUS_READ_SUB_ACT_UPDATED) zfcp_cfdc_adapter_access_changed(adapter); if (sr_buf->status_subtype & FSF_STATUS_READ_SUB_INCOMING_ELS) - queue_work(adapter->work_queue, &adapter->scan_work); + zfcp_fc_conditional_port_scan(adapter); break; case FSF_STATUS_READ_CFDC_UPDATED: zfcp_cfdc_adapter_access_changed(adapter); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org
[PATCH 2/8] zfcp: Make trace record tags unique
From: Steffen Maier Duplicate fssrh_2 from a54ca0f62f953898b05549391ac2a8a4dad6482b "[SCSI] zfcp: Redesign of the debug tracing for HBA records." complicates distinction of generic status read response from local link up. Duplicate fsscth1 from 2c55b750a884b86dea8b4cc5f15e1484cc47a25c "[SCSI] zfcp: Redesign of the debug tracing for SAN records." complicates distinction of good common transport response from invalid port handle. Signed-off-by: Steffen Maier Reviewed-by: Martin Peschke Cc: #2.6.38+ --- drivers/s390/scsi/zfcp_fsf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -219,7 +219,7 @@ static void zfcp_fsf_status_read_handler return; } - zfcp_dbf_hba_fsf_uss("fssrh_2", req); + zfcp_dbf_hba_fsf_uss("fssrh_4", req); switch (sr_buf->status_type) { case FSF_STATUS_READ_PORT_CLOSED: @@ -915,7 +915,7 @@ static void zfcp_fsf_send_ct_handler(str switch (header->fsf_status) { case FSF_GOOD: - zfcp_dbf_san_res("fsscth1", req); + zfcp_dbf_san_res("fsscth2", req); ct->status = 0; break; case FSF_SERVICE_CLASS_NOT_SUPPORTED: -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] zfcp: Do not wakeup while suspended
From: Steffen Maier If the mapping of FCP device bus ID and corresponding subchannel is modified while the Linux image is suspended, the resume of FCP devices can fail. During resume, zfcp gets callbacks from cio regarding the modified subchannels but they can be arbitrarily mixed with the restore/resume callback. Since the cio callbacks would trigger adapter recovery, zfcp could wakeup before the resume callback. Therefore, ignore the cio callbacks regarding subchannels while being suspended. We can safely do so, since zfcp does not deal itself with subchannels. For problem determination purposes, we still trace the ignored callback events. The following kernel messages could be seen on resume: kernel: : parent should not be sleeping As part of adapter reopen recovery, zfcp performs auto port scanning which can erroneously try to register new remote ports with scsi_transport_fc and the device core code complains about the parent (adapter) still sleeping. kernel: zfcp.3dff9c: :\ Setting up the QDIO connection to the FCP adapter failed kernel: zfcp.574d43: :\ ERP cannot recover an error on the FCP device In such cases, the adapter gave up recovery and remained blocked along with its child objects: remote ports and LUNs/scsi devices. Even the adapter shutdown as part of giving up recovery failed because the ccw device state remained disconnected. Later, the corresponding remote ports ran into dev_loss_tmo. As a result, the LUNs were erroneously not available again after resume. Even a manually triggered adapter recovery (e.g. sysfs attribute failed, or device offline/online via sysfs) could not recover the adapter due to the remaining disconnected state of the corresponding ccw device. Signed-off-by: Steffen Maier Cc: Tarak Reddy Cc: #2.6.32+ --- drivers/s390/scsi/zfcp_ccw.c | 73 +-- drivers/s390/scsi/zfcp_dbf.c | 20 +++ drivers/s390/scsi/zfcp_dbf.h |1 drivers/s390/scsi/zfcp_def.h |1 drivers/s390/scsi/zfcp_ext.h |1 5 files changed, 86 insertions(+), 10 deletions(-) --- a/drivers/s390/scsi/zfcp_ccw.c +++ b/drivers/s390/scsi/zfcp_ccw.c @@ -39,17 +39,23 @@ void zfcp_ccw_adapter_put(struct zfcp_ad spin_unlock_irqrestore(&zfcp_ccw_adapter_ref_lock, flags); } -static int zfcp_ccw_activate(struct ccw_device *cdev) - +/** + * zfcp_ccw_activate - activate adapter and wait for it to finish + * @cdev: pointer to belonging ccw device + * @clear: Status flags to clear. + * @tag: s390dbf trace record tag + */ +static int zfcp_ccw_activate(struct ccw_device *cdev, int clear, char *tag) { struct zfcp_adapter *adapter = zfcp_ccw_adapter_by_cdev(cdev); if (!adapter) return 0; + zfcp_erp_clear_adapter_status(adapter, clear); zfcp_erp_set_adapter_status(adapter, ZFCP_STATUS_COMMON_RUNNING); zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED, - "ccresu2"); + tag); zfcp_erp_wait(adapter); flush_work(&adapter->scan_work); @@ -164,26 +170,29 @@ static int zfcp_ccw_set_online(struct cc BUG_ON(!zfcp_reqlist_isempty(adapter->req_list)); adapter->req_no = 0; - zfcp_ccw_activate(cdev); + zfcp_ccw_activate(cdev, 0, "ccsonl1"); zfcp_ccw_adapter_put(adapter); return 0; } /** - * zfcp_ccw_set_offline - set_offline function of zfcp driver + * zfcp_ccw_offline_sync - shut down adapter and wait for it to finish * @cdev: pointer to belonging ccw device + * @set: Status flags to set. + * @tag: s390dbf trace record tag * * This function gets called by the common i/o layer and sets an adapter * into state offline. */ -static int zfcp_ccw_set_offline(struct ccw_device *cdev) +static int zfcp_ccw_offline_sync(struct ccw_device *cdev, int set, char *tag) { struct zfcp_adapter *adapter = zfcp_ccw_adapter_by_cdev(cdev); if (!adapter) return 0; - zfcp_erp_adapter_shutdown(adapter, 0, "ccsoff1"); + zfcp_erp_set_adapter_status(adapter, set); + zfcp_erp_adapter_shutdown(adapter, 0, tag); zfcp_erp_wait(adapter); zfcp_ccw_adapter_put(adapter); @@ -191,6 +200,18 @@ static int zfcp_ccw_set_offline(struct c } /** + * zfcp_ccw_set_offline - set_offline function of zfcp driver + * @cdev: pointer to belonging ccw device + * + * This function gets called by the common i/o layer and sets an adapter + * into state offline. + */ +static int zfcp_ccw_set_offline(struct ccw_device *cdev) +{ + return zfcp_ccw_offline_sync(cdev, 0, "ccsoff1"); +} + +/** * zfcp_ccw_notify - ccw notify function * @cdev: pointer to belonging ccw device * @event: indicates if adapter was detached or attached @@ -207,6 +228,11 @@ static int zfcp_ccw_notify(struct ccw_de switch (event) { case CIO_GONE: + if (atomic_read(&adapter->status) & + ZFCP_STATUS_
[PATCH 8/8] zfcp: only access zfcp_scsi_dev for valid scsi_device
From: Martin Peschke __scsi_remove_device (e.g. due to dev_loss_tmo) calls zfcp_scsi_slave_destroy which in turn sends a close LUN FSF request to the adapter. After 30 seconds without response, zfcp_erp_timeout_handler kicks the ERP thread failing the close LUN ERP action. zfcp_erp_wait in zfcp_erp_lun_shutdown_wait and thus zfcp_scsi_slave_destroy returns and then scsi_device is no longer valid. Sometime later the response to the close LUN FSF request may finally come in. However, commit b62a8d9b45b971a67a0f8413338c230e3117dff5 "[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit" introduced a number of attempts to unconditionally access struct zfcp_scsi_dev through struct scsi_device causing a use-after-free. This leads to an Oops due to kernel page fault in one of: zfcp_fsf_abort_fcp_command_handler, zfcp_fsf_open_lun_handler, zfcp_fsf_close_lun_handler, zfcp_fsf_req_trace, zfcp_fsf_fcp_handler_common. Move dereferencing of zfcp private data zfcp_scsi_dev allocated in scsi_device via scsi_transport_reserve_device after the check for potentially aborted FSF request and thus no longer valid scsi_device. Only then assign sdev_to_zfcp(sdev) to the local auto variable struct zfcp_scsi_dev *zfcp_sdev. Signed-off-by: Martin Peschke Signed-off-by: Steffen Maier Cc: #2.6.37+ --- drivers/s390/scsi/zfcp_fsf.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -801,12 +801,14 @@ out: static void zfcp_fsf_abort_fcp_command_handler(struct zfcp_fsf_req *req) { struct scsi_device *sdev = req->data; - struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev); + struct zfcp_scsi_dev *zfcp_sdev; union fsf_status_qual *fsq = &req->qtcb->header.fsf_status_qual; if (req->status & ZFCP_STATUS_FSFREQ_ERROR) return; + zfcp_sdev = sdev_to_zfcp(sdev); + switch (req->qtcb->header.fsf_status) { case FSF_PORT_HANDLE_NOT_VALID: if (fsq->word[0] == fsq->word[1]) { @@ -1769,13 +1771,15 @@ static void zfcp_fsf_open_lun_handler(st { struct zfcp_adapter *adapter = req->adapter; struct scsi_device *sdev = req->data; - struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev); + struct zfcp_scsi_dev *zfcp_sdev; struct fsf_qtcb_header *header = &req->qtcb->header; struct fsf_qtcb_bottom_support *bottom = &req->qtcb->bottom.support; if (req->status & ZFCP_STATUS_FSFREQ_ERROR) return; + zfcp_sdev = sdev_to_zfcp(sdev); + atomic_clear_mask(ZFCP_STATUS_COMMON_ACCESS_DENIED | ZFCP_STATUS_COMMON_ACCESS_BOXED | ZFCP_STATUS_LUN_SHARED | @@ -1886,11 +1890,13 @@ out: static void zfcp_fsf_close_lun_handler(struct zfcp_fsf_req *req) { struct scsi_device *sdev = req->data; - struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev); + struct zfcp_scsi_dev *zfcp_sdev; if (req->status & ZFCP_STATUS_FSFREQ_ERROR) return; + zfcp_sdev = sdev_to_zfcp(sdev); + switch (req->qtcb->header.fsf_status) { case FSF_PORT_HANDLE_NOT_VALID: zfcp_erp_adapter_reopen(zfcp_sdev->port->adapter, 0, "fscuh_1"); @@ -1980,7 +1986,7 @@ static void zfcp_fsf_req_trace(struct zf { struct fsf_qual_latency_info *lat_in; struct latency_cont *lat = NULL; - struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scsi->device); + struct zfcp_scsi_dev *zfcp_sdev; struct zfcp_blk_drv_data blktrc; int ticks = req->adapter->timer_ticks; @@ -1995,6 +2001,7 @@ static void zfcp_fsf_req_trace(struct zf if (req->adapter->adapter_features & FSF_FEATURE_MEASUREMENT_DATA && !(req->status & ZFCP_STATUS_FSFREQ_ERROR)) { + zfcp_sdev = sdev_to_zfcp(scsi->device); blktrc.flags |= ZFCP_BLK_LAT_VALID; blktrc.channel_lat = lat_in->channel_lat * ticks; blktrc.fabric_lat = lat_in->fabric_lat * ticks; @@ -2032,12 +2039,14 @@ static void zfcp_fsf_fcp_handler_common( { struct scsi_cmnd *scmnd = req->data; struct scsi_device *sdev = scmnd->device; - struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev); + struct zfcp_scsi_dev *zfcp_sdev; struct fsf_qtcb_header *header = &req->qtcb->header; if (unlikely(req->status & ZFCP_STATUS_FSFREQ_ERROR)) return; + zfcp_sdev = sdev_to_zfcp(sdev); + switch (header->fsf_status) { case FSF_HANDLE_MISMATCH: case FSF_PORT_HANDLE_NOT_VALID: -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] zfcp: Bounds checking for deferred error trace
From: Steffen Maier The pl vector has scount elements, i.e. pl[scount-1] is the last valid element. For maximum sized requests, payload->counter == scount after the last loop iteration. Therefore, do bounds checking first (with boolean shortcut) to not access the invalid element pl[scount]. Do not trust the maximum sbale->scount value from the HBA but ensure we won't access the pl vector out of our allocated bounds. While at it, clean up scoping and prevent unnecessary memset. Minor fix for 86a9668a8d29ea711613e1cb37efa68e7c4db564 "[SCSI] zfcp: support for hardware data router" Signed-off-by: Steffen Maier Reviewed-by: Martin Peschke Cc: #3.2+ --- drivers/s390/scsi/zfcp_dbf.c |2 +- drivers/s390/scsi/zfcp_qdio.c | 16 ++-- 2 files changed, 11 insertions(+), 7 deletions(-) --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -191,7 +191,7 @@ void zfcp_dbf_hba_def_err(struct zfcp_ad length = min((u16)sizeof(struct qdio_buffer), (u16)ZFCP_DBF_PAY_MAX_REC); - while ((char *)pl[payload->counter] && payload->counter < scount) { + while (payload->counter < scount && (char *)pl[payload->counter]) { memcpy(payload->data, (char *)pl[payload->counter], length); debug_event(dbf->pay, 1, payload, zfcp_dbf_plen(length)); payload->counter++; --- a/drivers/s390/scsi/zfcp_qdio.c +++ b/drivers/s390/scsi/zfcp_qdio.c @@ -102,18 +102,22 @@ static void zfcp_qdio_int_resp(struct cc { struct zfcp_qdio *qdio = (struct zfcp_qdio *) parm; struct zfcp_adapter *adapter = qdio->adapter; - struct qdio_buffer_element *sbale; int sbal_no, sbal_idx; - void *pl[ZFCP_QDIO_MAX_SBALS_PER_REQ + 1]; - u64 req_id; - u8 scount; if (unlikely(qdio_err)) { - memset(pl, 0, ZFCP_QDIO_MAX_SBALS_PER_REQ * sizeof(void *)); if (zfcp_adapter_multi_buffer_active(adapter)) { + void *pl[ZFCP_QDIO_MAX_SBALS_PER_REQ + 1]; + struct qdio_buffer_element *sbale; + u64 req_id; + u8 scount; + + memset(pl, 0, + ZFCP_QDIO_MAX_SBALS_PER_REQ * sizeof(void *)); sbale = qdio->res_q[idx]->element; req_id = (u64) sbale->addr; - scount = sbale->scount + 1; /* incl. signaling SBAL */ + scount = min(sbale->scount + 1, +ZFCP_QDIO_MAX_SBALS_PER_REQ + 1); +/* incl. signaling SBAL */ for (sbal_no = 0; sbal_no < scount; sbal_no++) { sbal_idx = (idx + sbal_no) % -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] zfcp: patches for 3.6-rc
James, here is a series of zfcp fixes for the current 3.6-rc. The patches apply on top of the current fixes branch of your scsi.git. Steffen Linux on System z Development IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] zfcp: remove invalid reference to list iterator variable
From: Julia Lawall If list_for_each_entry, etc complete a traversal of the list, the iterator variable ends up pointing to an address at an offset from the list head, and not a meaningful structure. Thus this value should not be used after the end of the iterator. Replace port->adapter->scsi_host by adapter->scsi_host. This problem was found using Coccinelle (http://coccinelle.lip6.fr/). Oversight in upsteam commit of v2.6.37 a1ca48319a9aa1c5b57ce142f538e76050bb8972 "[SCSI] zfcp: Move ACL/CFDC code to zfcp_cfdc.c" which merged the content of zfcp_erp_port_access_changed(). Signed-off-by: Julia Lawall Signed-off-by: Steffen Maier Reviewed-by: Martin Peschke Cc: #2.6.37+ --- drivers/s390/scsi/zfcp_cfdc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/s390/scsi/zfcp_cfdc.c +++ b/drivers/s390/scsi/zfcp_cfdc.c @@ -293,7 +293,7 @@ void zfcp_cfdc_adapter_access_changed(st } read_unlock_irqrestore(&adapter->port_list_lock, flags); - shost_for_each_device(sdev, port->adapter->scsi_host) { + shost_for_each_device(sdev, adapter->scsi_host) { zfcp_sdev = sdev_to_zfcp(sdev); status = atomic_read(&zfcp_sdev->status); if ((status & ZFCP_STATUS_COMMON_ACCESS_DENIED) || -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] zfcp: restore refcount check on port_remove
From: Steffen Maier Upstream commit f3450c7b917201bb49d67032e9f60d5125675d6a "[SCSI] zfcp: Replace local reference counting with common kref" accidentally dropped a reference count check before tearing down zfcp_ports that are potentially in use by zfcp_units. Even remote ports in use can be removed causing unreachable garbage objects zfcp_ports with zfcp_units. Thus units won't come back even after a manual port_rescan. The kref of zfcp_port->dev.kobj is already used by the driver core. We cannot re-use it to track the number of zfcp_units. Re-introduce our own counter for units per port and check on port_remove. Signed-off-by: Steffen Maier Reviewed-by: Heiko Carstens Cc: #2.6.33+ --- drivers/s390/scsi/zfcp_aux.c |1 + drivers/s390/scsi/zfcp_def.h |1 + drivers/s390/scsi/zfcp_ext.h |1 + drivers/s390/scsi/zfcp_sysfs.c | 18 -- drivers/s390/scsi/zfcp_unit.c | 36 ++-- 5 files changed, 45 insertions(+), 12 deletions(-) --- a/drivers/s390/scsi/zfcp_aux.c +++ b/drivers/s390/scsi/zfcp_aux.c @@ -519,6 +519,7 @@ struct zfcp_port *zfcp_port_enqueue(stru rwlock_init(&port->unit_list_lock); INIT_LIST_HEAD(&port->unit_list); + atomic_set(&port->units, 0); INIT_WORK(&port->gid_pn_work, zfcp_fc_port_did_lookup); INIT_WORK(&port->test_link_work, zfcp_fc_link_test_work); --- a/drivers/s390/scsi/zfcp_def.h +++ b/drivers/s390/scsi/zfcp_def.h @@ -205,6 +205,7 @@ struct zfcp_port { struct zfcp_adapter*adapter; /* adapter used to access port */ struct list_headunit_list; /* head of logical unit list */ rwlock_tunit_list_lock; /* unit list lock */ + atomic_tunits; /* zfcp_unit count */ atomic_t status; /* status of this remote port */ u64wwnn; /* WWNN if known */ u64wwpn; /* WWPN */ --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -159,6 +159,7 @@ extern void zfcp_scsi_dif_sense_error(st extern struct attribute_group zfcp_sysfs_unit_attrs; extern struct attribute_group zfcp_sysfs_adapter_attrs; extern struct attribute_group zfcp_sysfs_port_attrs; +extern struct mutex zfcp_sysfs_port_units_mutex; extern struct device_attribute *zfcp_sysfs_sdev_attrs[]; extern struct device_attribute *zfcp_sysfs_shost_attrs[]; --- a/drivers/s390/scsi/zfcp_sysfs.c +++ b/drivers/s390/scsi/zfcp_sysfs.c @@ -227,6 +227,8 @@ static ssize_t zfcp_sysfs_port_rescan_st static ZFCP_DEV_ATTR(adapter, port_rescan, S_IWUSR, NULL, zfcp_sysfs_port_rescan_store); +DEFINE_MUTEX(zfcp_sysfs_port_units_mutex); + static ssize_t zfcp_sysfs_port_remove_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -249,6 +251,16 @@ static ssize_t zfcp_sysfs_port_remove_st else retval = 0; + mutex_lock(&zfcp_sysfs_port_units_mutex); + if (atomic_read(&port->units) > 0) { + retval = -EBUSY; + mutex_unlock(&zfcp_sysfs_port_units_mutex); + goto out; + } + /* port is about to be removed, so no more unit_add */ + atomic_set(&port->units, -1); + mutex_unlock(&zfcp_sysfs_port_units_mutex); + write_lock_irq(&adapter->port_list_lock); list_del(&port->list); write_unlock_irq(&adapter->port_list_lock); @@ -289,12 +301,14 @@ static ssize_t zfcp_sysfs_unit_add_store { struct zfcp_port *port = container_of(dev, struct zfcp_port, dev); u64 fcp_lun; + int retval; if (strict_strtoull(buf, 0, (unsigned long long *) &fcp_lun)) return -EINVAL; - if (zfcp_unit_add(port, fcp_lun)) - return -EINVAL; + retval = zfcp_unit_add(port, fcp_lun); + if (retval) + return retval; return count; } --- a/drivers/s390/scsi/zfcp_unit.c +++ b/drivers/s390/scsi/zfcp_unit.c @@ -104,7 +104,7 @@ static void zfcp_unit_release(struct dev { struct zfcp_unit *unit = container_of(dev, struct zfcp_unit, dev); - put_device(&unit->port->dev); + atomic_dec(&unit->port->units); kfree(unit); } @@ -119,16 +119,27 @@ static void zfcp_unit_release(struct dev int zfcp_unit_add(struct zfcp_port *port, u64 fcp_lun) { struct zfcp_unit *unit; + int retval = 0; + + mutex_lock(&zfcp_sysfs_port_units_mutex); + if (atomic_read(&port->units) == -1) { + /* port is already gone */ + retval = -ENODEV; + goto out; + } unit = zfcp_unit_find(port, fcp_lun); if (unit) { put_device(&unit->dev); - return -EEXIST; + retval = -EEXIST; + goto out;
[PATCH 1/8] zfcp: Adapt to new FC_PORTSPEED semantics
From: Steffen Maier Commit a9277e7783651d4e0a849f7988340b1c1cf748a4 "[SCSI] scsi_transport_fc: Getting FC Port Speed in sync with FC-GS" changed the semantics of FC_PORTSPEED defines to FDMI port attributes of FC-HBA/SM-HBA which is different from the previous bit reversed Report Port Speed Capabilities (RPSC) ELS of FC-GS/FC-LS. Zfcp showed "10 Gbit" instead of "4 Gbit" for supported_speeds. It now uses explicit bit conversion as the other LLDs already do, in order to be independent of the kernel bit semantics. See also http://marc.info/?l=linux-scsi&m=134452926830730&w=2 Signed-off-by: Steffen Maier Reviewed-by: Martin Peschke Cc: #3.4+ --- drivers/s390/scsi/zfcp_fsf.c | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) --- a/drivers/s390/scsi/zfcp_fsf.c +++ b/drivers/s390/scsi/zfcp_fsf.c @@ -437,6 +437,34 @@ void zfcp_fsf_req_dismiss_all(struct zfc } } +#define ZFCP_FSF_PORTSPEED_1GBIT (1 << 0) +#define ZFCP_FSF_PORTSPEED_2GBIT (1 << 1) +#define ZFCP_FSF_PORTSPEED_4GBIT (1 << 2) +#define ZFCP_FSF_PORTSPEED_10GBIT (1 << 3) +#define ZFCP_FSF_PORTSPEED_8GBIT (1 << 4) +#define ZFCP_FSF_PORTSPEED_16GBIT (1 << 5) +#define ZFCP_FSF_PORTSPEED_NOT_NEGOTIATED (1 << 15) + +static u32 zfcp_fsf_convert_portspeed(u32 fsf_speed) +{ + u32 fdmi_speed = 0; + if (fsf_speed & ZFCP_FSF_PORTSPEED_1GBIT) + fdmi_speed |= FC_PORTSPEED_1GBIT; + if (fsf_speed & ZFCP_FSF_PORTSPEED_2GBIT) + fdmi_speed |= FC_PORTSPEED_2GBIT; + if (fsf_speed & ZFCP_FSF_PORTSPEED_4GBIT) + fdmi_speed |= FC_PORTSPEED_4GBIT; + if (fsf_speed & ZFCP_FSF_PORTSPEED_10GBIT) + fdmi_speed |= FC_PORTSPEED_10GBIT; + if (fsf_speed & ZFCP_FSF_PORTSPEED_8GBIT) + fdmi_speed |= FC_PORTSPEED_8GBIT; + if (fsf_speed & ZFCP_FSF_PORTSPEED_16GBIT) + fdmi_speed |= FC_PORTSPEED_16GBIT; + if (fsf_speed & ZFCP_FSF_PORTSPEED_NOT_NEGOTIATED) + fdmi_speed |= FC_PORTSPEED_NOT_NEGOTIATED; + return fdmi_speed; +} + static int zfcp_fsf_exchange_config_evaluate(struct zfcp_fsf_req *req) { struct fsf_qtcb_bottom_config *bottom = &req->qtcb->bottom.config; @@ -456,7 +484,8 @@ static int zfcp_fsf_exchange_config_eval fc_host_port_name(shost) = nsp->fl_wwpn; fc_host_node_name(shost) = nsp->fl_wwnn; fc_host_port_id(shost) = ntoh24(bottom->s_id); - fc_host_speed(shost) = bottom->fc_link_speed; + fc_host_speed(shost) = + zfcp_fsf_convert_portspeed(bottom->fc_link_speed); fc_host_supported_classes(shost) = FC_COS_CLASS2 | FC_COS_CLASS3; adapter->hydra_version = bottom->adapter_type; @@ -580,7 +609,8 @@ static void zfcp_fsf_exchange_port_evalu } else fc_host_permanent_port_name(shost) = fc_host_port_name(shost); fc_host_maxframe_size(shost) = bottom->maximum_frame_size; - fc_host_supported_speeds(shost) = bottom->supported_speed; + fc_host_supported_speeds(shost) = + zfcp_fsf_convert_portspeed(bottom->supported_speed); memcpy(fc_host_supported_fc4s(shost), bottom->supported_fc4_types, FC_FC4_LIST_SIZE); memcpy(fc_host_active_fc4s(shost), bottom->active_fc4_types, -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 01:18:31PM +0200, Paolo Bonzini wrote: > Il 04/09/2012 13:09, Michael S. Tsirkin ha scritto: > >> > queuecommand on CPU #0 queuecommand #2 on CPU #1 > >> > -- > >> > atomic_inc_return(...) == 1 > >> >atomic_inc_return(...) == 2 > >> >virtscsi_queuecommand to queue #1 > >> > tgt->req_vq = queue #0 > >> > virtscsi_queuecommand to queue #0 > >> > > >> > then two requests are issued to different queues without a quiescent > >> > point in the middle. > > What happens then? Does this break correctness? > > Yes, requests to the same target should be processed in FIFO order, or > you have things like a flush issued before the write it was supposed to > flush. This is why I can only change the queue when there is no request > pending. > > Paolo I see. I guess you can rewrite this as: atomic_inc if (atomic_read() == 1) which is a bit cheaper, and make the fact that you do not need increment and return to be atomic, explicit. Another simple idea: store last processor id in target, if it is unchanged no need to play with req_vq and take spinlock. Also - some kind of comment explaining why a similar race can not happen with this lock in place would be nice: I see why this specific race can not trigger but since lock is dropped later before you submit command, I have hard time convincing myself what exactly gurantees that vq is never switched before or even while command is submitted. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 15:35, Michael S. Tsirkin ha scritto: > I see. I guess you can rewrite this as: > atomic_inc > if (atomic_read() == 1) > which is a bit cheaper, and make the fact > that you do not need increment and return to be atomic, > explicit. It seems more complicated to me for hardly any reason. (Besides, is it cheaper? It has one less memory barrier on some architectures I frankly do not care much about---not on x86---but it also has two memory accesses instead of one on all architectures). > Another simple idea: store last processor id in target, > if it is unchanged no need to play with req_vq > and take spinlock. Not so sure, consider the previous example with last_processor_id equal to 1. queuecommand on CPU #0 queuecommand #2 on CPU #1 -- atomic_inc_return(...) == 1 atomic_inc_return(...) == 2 virtscsi_queuecommand to queue #1 last_processor_id == 0? no spin_lock tgt->req_vq = queue #0 spin_unlock virtscsi_queuecommand to queue #0 This is not a network driver, there are still a lot of locks around. This micro-optimization doesn't pay enough for the pain. > Also - some kind of comment explaining why a similar race can not happen > with this lock in place would be nice: I see why this specific race can > not trigger but since lock is dropped later before you submit command, I > have hard time convincing myself what exactly gurantees that vq is never > switched before or even while command is submitted. Because tgt->reqs will never become zero (which is a necessary condition for tgt->req_vq to change), as long as one request is executing virtscsi_queuecommand. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 14:48, Michael S. Tsirkin ha scritto: >> > This patch adds queue steering to virtio-scsi. When a target is sent >> > multiple requests, we always drive them to the same queue so that FIFO >> > processing order is kept. However, if a target was idle, we can choose >> > a queue arbitrarily. In this case the queue is chosen according to the >> > current VCPU, so the driver expects the number of request queues to be >> > equal to the number of VCPUs. This makes it easy and fast to select >> > the queue, and also lets the driver optimize the IRQ affinity for the >> > virtqueues (each virtqueue's affinity is set to the CPU that "owns" >> > the queue). >> > >> > Signed-off-by: Paolo Bonzini > I guess an alternative is a per-target vq. > Is the reason you avoid this that you expect more targets > than cpus? If yes this is something you might want to > mention in the log. One reason is that, even though in practice I expect roughly the same number of targets and VCPUs, hotplug means the number of targets is difficult to predict and is usually fixed to 256. The other reason is that per-target vq didn't give any performance advantage. The bonus comes from cache locality and less process migrations, more than from the independent virtqueues. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
On Wed, Aug 29, 2012 at 10:59 AM, Dan Williams wrote: > On Wed, 2012-08-29 at 06:50 +, Bart Van Assche wrote: >> On 08/29/12 05:12, Dan Williams wrote: >> > John reports: >> > BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202] >> > [..] >> > Call Trace: >> > [] scsi_remove_target+0xda/0x1f0 >> > [] sas_rphy_remove+0x55/0x60 >> > [] sas_rphy_delete+0x11/0x20 >> > [] sas_port_delete+0x25/0x160 >> > [] mptsas_del_end_device+0x183/0x270 >> > >> > ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race". >> >> Including that call stack in the patch description may create the >> misleading impression that this only occurs with the mptsas driver. This >> lockup also happens with at least the iSCSI initiator. See also >> http://lkml.org/lkml/2012/8/24/340. > > I don't think it does that. The title is pretty generic, but you're > right the impact is potentially all scsi_remove_target() users. > >> By the way, in order to get a patch in the stable tree the proper "Cc:" >> tag should be added in the patch description but the >> sta...@vger.kernel.org e-mail address should be left out from the >> Cc-list of the e-mail with the patch. > > No, we talked about that at kernel summit. It's ok for the stable@ > alias to get a few extra mails. The patch won't be applied until it > hits mainline and in the meantime it gives a heads up to the -stable > folks, or anyone that wants to follow up on stable patches making their > way to mainline. > It appears that this did not get into 3.6 rc4 (unless I am reading the changlog wrong). Do I have to file an official bug report to get this noticed? John -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 03:45:57PM +0200, Paolo Bonzini wrote: > > Also - some kind of comment explaining why a similar race can not happen > > with this lock in place would be nice: I see why this specific race can > > not trigger but since lock is dropped later before you submit command, I > > have hard time convincing myself what exactly gurantees that vq is never > > switched before or even while command is submitted. > > Because tgt->reqs will never become zero (which is a necessary condition > for tgt->req_vq to change), as long as one request is executing > virtscsi_queuecommand. > > Paolo Yes but this logic would apparently imply the lock is not necessary, and it actually is. I am not saying anything is wrong just that it looks scary. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 03:49:42PM +0200, Paolo Bonzini wrote: > Il 04/09/2012 14:48, Michael S. Tsirkin ha scritto: > >> > This patch adds queue steering to virtio-scsi. When a target is sent > >> > multiple requests, we always drive them to the same queue so that FIFO > >> > processing order is kept. However, if a target was idle, we can choose > >> > a queue arbitrarily. In this case the queue is chosen according to the > >> > current VCPU, so the driver expects the number of request queues to be > >> > equal to the number of VCPUs. This makes it easy and fast to select > >> > the queue, and also lets the driver optimize the IRQ affinity for the > >> > virtqueues (each virtqueue's affinity is set to the CPU that "owns" > >> > the queue). > >> > > >> > Signed-off-by: Paolo Bonzini > > I guess an alternative is a per-target vq. > > Is the reason you avoid this that you expect more targets > > than cpus? If yes this is something you might want to > > mention in the log. > > One reason is that, even though in practice I expect roughly the same > number of targets and VCPUs, hotplug means the number of targets is > difficult to predict and is usually fixed to 256. > > The other reason is that per-target vq didn't give any performance > advantage. The bonus comes from cache locality and less process > migrations, more than from the independent virtqueues. > > Paolo Okay, and why is per-target worse for cache locality? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 0/7] ZPODD patches
v6: When user changes may_power_off flag through sysfs entry and if device is already runtime suspended, resume resume it so that it can respect this flag next time it is runtime suspended as suggested by Alan Stern. Call scsi_autopm_get/put_device once in sr_check_events as suggested by Alan Stern. v5: Add may_power_off flag to scsi device. Alan Stern suggested that I should not mess runtime suspend with runtime power off, but the current zpodd implementation made it not easy to seperate. So I re-wrote the zpodd implementation, the end result is, normal ODD can also enter runtime suspended state, but their power won't be removed. v4: Rebase on top of Linus' tree, due to this, the problem of a missing flag in v3 is gone; Add a new function scsi_autopm_put_device_autosuspend to first mark last busy for the device and then put autosuspend it as suggested by Oliver Neukum. Typo fix as pointed by Sergei Shtylyov. Check can_power_off flag before any runtime pm operations in sr. v3: Rebase on top of scsi-misc tree; Add the sr related patches previously in Jeff's libata tree; Re-organize the sr patches. A problem for now: for patch scsi: sr: support zero power ODD(ZPODD) I can't set a flag in libata-acpi.c since a related function is missing in scsi-misc tree. Will fix this when 3.6-rc1 released. v2: Bug fix for v1; Use scsi_autopm_* in sr driver instead of pm_runtime_*; v1: Here are some patches to make ZPODD easier to use for end users and a fix for using ZPODD with system suspend. Aaron Lu (7): scsi: sr: support runtime pm for ODD block: genhd: export disk_(un)block_events scsi: sr: block events checking when suspended for zpodd libata: acpi: set can_power_off for both ODD and HD scsi: pm: add may_power_off flag scsi: sr: use may_power_off libata: acpi: respect may_power_off flag block/genhd.c | 2 ++ drivers/ata/libata-acpi.c | 35 -- drivers/scsi/scsi_sysfs.c | 37 ++- drivers/scsi/sr.c | 89 -- drivers/scsi/sr.h | 1 + include/scsi/scsi_device.h | 2 ++ 6 files changed, 152 insertions(+), 14 deletions(-) -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/7] scsi: sr: support runtime pm for ODD
From: Aaron Lu The ODD will be placed into suspend state when: 1 For tray type ODD, no media inside and door closed; 2 For slot type ODD, no media inside; And together with ACPI, when we suspend the ODD's parent(the port it attached to), we will omit the power altogether to reduce power consumption(done in libata-acpi.c). The ODD can be resumed either by user or by software. For user to resume the suspended ODD: 1 For tray type ODD, press the eject button; 2 For slot type ODD, insert a disc; Once such events happened, an ACPI notification will be sent and in our handler, we will resume the ODD(again in libata-acpi.c). This kind of resume requires platform and device support and is reflected by the can_power_off flag. For software to resume the suspended ODD, we did this in ODD's open/release and check_event function. Signed-off-by: Aaron Lu --- drivers/ata/libata-acpi.c | 6 ++-- drivers/scsi/sr.c | 76 -- include/scsi/scsi_device.h | 1 + 3 files changed, 79 insertions(+), 4 deletions(-) diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c index 902b5a4..64ba43d 100644 --- a/drivers/ata/libata-acpi.c +++ b/drivers/ata/libata-acpi.c @@ -985,8 +985,10 @@ static void ata_acpi_wake_dev(acpi_handle handle, u32 event, void *context) struct ata_device *ata_dev = context; if (event == ACPI_NOTIFY_DEVICE_WAKE && ata_dev && - pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) - scsi_autopm_get_device(ata_dev->sdev); + pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) { + ata_dev->sdev->wakeup_by_user = 1; + pm_runtime_resume(&ata_dev->sdev->sdev_gendev); + } } static void ata_acpi_add_pm_notifier(struct ata_device *dev) diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 5fc97d2..637f4ca 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -45,6 +45,7 @@ #include #include #include +#include #include #include @@ -79,6 +80,8 @@ static DEFINE_MUTEX(sr_mutex); static int sr_probe(struct device *); static int sr_remove(struct device *); static int sr_done(struct scsi_cmnd *); +static int sr_suspend(struct device *, pm_message_t msg); +static int sr_resume(struct device *); static struct scsi_driver sr_template = { .owner = THIS_MODULE, @@ -86,6 +89,8 @@ static struct scsi_driver sr_template = { .name = "sr", .probe = sr_probe, .remove = sr_remove, + .suspend= sr_suspend, + .resume = sr_resume, }, .done = sr_done, }; @@ -146,8 +151,12 @@ static inline struct scsi_cd *scsi_cd_get(struct gendisk *disk) kref_get(&cd->kref); if (scsi_device_get(cd->device)) goto out_put; + if (scsi_autopm_get_device(cd->device)) + goto out_pm; goto out; + out_pm: + scsi_device_put(cd->device); out_put: kref_put(&cd->kref, sr_kref_release); cd = NULL; @@ -163,9 +172,61 @@ static void scsi_cd_put(struct scsi_cd *cd) mutex_lock(&sr_ref_mutex); kref_put(&cd->kref, sr_kref_release); scsi_device_put(sdev); + scsi_autopm_put_device(sdev); mutex_unlock(&sr_ref_mutex); } +static int sr_suspend(struct device *dev, pm_message_t msg) +{ + int suspend; + struct scsi_sense_hdr sshdr; + struct scsi_cd *cd = dev_get_drvdata(dev); + + /* no action for system pm */ + if (!PMSG_IS_AUTO(msg)) + return 0; + + /* +* ODD can be runtime suspended when: +* tray type: no media inside and tray closed +* slot type: no media inside +*/ + scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); + + if (cd->cdi.mask & CDC_CLOSE_TRAY) + /* no media for caddy/slot type ODD */ + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a; + else + /* no media and door closed for tray type ODD */ + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a && + sshdr.ascq == 0x01; + + if (!suspend) + return -EBUSY; + + return 0; +} + +static int sr_resume(struct device *dev) +{ + struct scsi_cd *cd; + struct scsi_sense_hdr sshdr; + + cd = dev_get_drvdata(dev); + + /* get the disk ready */ + scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); + + /* if user wakes up the ODD, eject the tray */ + if (cd->device->wakeup_by_user) { + cd->device->wakeup_by_user = 0; + if (!(cd->cdi.mask & CDC_CLOSE_TRAY)) + sr_tray_move(&cd->cdi, 1); + } + + return 0; +} + static unsigned int sr_get_events(struct scsi_device *sdev) { u8 buf[8]; @@
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 16:19, Michael S. Tsirkin ha scritto: > > > Also - some kind of comment explaining why a similar race can not happen > > > with this lock in place would be nice: I see why this specific race can > > > not trigger but since lock is dropped later before you submit command, I > > > have hard time convincing myself what exactly gurantees that vq is never > > > switched before or even while command is submitted. > > > > Because tgt->reqs will never become zero (which is a necessary condition > > for tgt->req_vq to change), as long as one request is executing > > virtscsi_queuecommand. > > Yes but this logic would apparently imply the lock is not necessary, and > it actually is. I am not saying anything is wrong just that it > looks scary. Ok, I get the misunderstanding. For the logic to hold, you need a serialization point after which tgt->req_vq is not changed. The lock provides one such serialization point: after you unlock tgt->tgt_lock, nothing else will change tgt->req_vq until your request completes. Without the lock, there could always be a thread that is in the "then" branch but has been scheduled out, and when rescheduled it will change tgt->req_vq. Perhaps the confusion comes from the atomic_inc_return, and that was what my "why is this atomic" wanted to clear. **tgt->reqs is only atomic to avoid taking a spinlock in the ISR**. If you read the code with the lock, but with tgt->reqs as a regular non-atomic int, it should be much easier to reason on the code. I can split the patch if needed. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 2/7] block: genhd: export disk_(un)block_events
From: Aaron Lu When ODD is runtime powered off, there is no meaning to check events for it, so disk_(un)block_events will be called in its suspend/resume callback. Signed-off-by: Aaron Lu --- block/genhd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/genhd.c b/block/genhd.c index cac7366..f630150 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1468,6 +1468,7 @@ void disk_block_events(struct gendisk *disk) mutex_unlock(&ev->block_mutex); } +EXPORT_SYMBOL(disk_block_events); static void __disk_unblock_events(struct gendisk *disk, bool check_now) { @@ -1512,6 +1513,7 @@ void disk_unblock_events(struct gendisk *disk) if (disk->ev) __disk_unblock_events(disk, false); } +EXPORT_SYMBOL(disk_unblock_events); /** * disk_flush_events - schedule immediate event checking and flushing -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 3/7] scsi: sr: block events checking when suspended for zpodd
From: Aaron Lu When ODD is runtime suspended and if it can be powered off(reflected by the can_power_off flag), we will block events checking for it so that it can stay in powered off state until resumed by: 1 user pressing the eject button or inserting a disc; 2 software opening the block device. But not by events checking that normally occurs every 2 seconds. Old versions of udisk will poll the ODD periodically, so it is advised to inhibit the polling: $ udisks --inhibit-polling /dev/sr0 Signed-off-by: Aaron Lu --- drivers/scsi/sr.c | 10 ++ drivers/scsi/sr.h | 1 + 2 files changed, 11 insertions(+) diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 637f4ca..a5bb687 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -204,6 +204,11 @@ static int sr_suspend(struct device *dev, pm_message_t msg) if (!suspend) return -EBUSY; + if (cd->device->can_power_off) { + cd->disk_events_blocked = 1; + disk_block_events(cd->disk); + } + return 0; } @@ -224,6 +229,11 @@ static int sr_resume(struct device *dev) sr_tray_move(&cd->cdi, 1); } + if (cd->disk_events_blocked) { + cd->disk_events_blocked = 0; + disk_unblock_events(cd->disk); + } + return 0; } diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h index 37c8f6b..9d9a76f 100644 --- a/drivers/scsi/sr.h +++ b/drivers/scsi/sr.h @@ -41,6 +41,7 @@ typedef struct scsi_cd { unsigned readcd_known:1;/* drive supports READ_CD (0xbe) */ unsigned readcd_cdda:1; /* reading audio data using READ_CD */ unsigned media_present:1; /* media is present */ + unsigned disk_events_blocked:1; /* disk events is blocked */ /* GET_EVENT spurious event handling, blk layer guarantees exclusion */ int tur_mismatch; /* nr of get_event TUR mismatches */ -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 4/7] libata: acpi: set can_power_off for both ODD and HD
From: Aaron Lu Hard disk may also be runtime powered off, so set can_power_off flag for it too if condition satisfies. Signed-off-by: Aaron Lu --- drivers/ata/libata-acpi.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c index 64ba43d..6c8f89c 100644 --- a/drivers/ata/libata-acpi.c +++ b/drivers/ata/libata-acpi.c @@ -1005,7 +1005,7 @@ static void ata_acpi_add_pm_notifier(struct ata_device *dev) if (ACPI_FAILURE(status)) return; - if (dev->sdev->can_power_off) { + if (dev->class == ATA_DEV_ATAPI && dev->sdev->can_power_off) { acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY, ata_acpi_wake_dev, dev); device_set_run_wake(&dev->sdev->sdev_gendev, true); @@ -1026,7 +1026,7 @@ static void ata_acpi_remove_pm_notifier(struct ata_device *dev) if (ACPI_FAILURE(status)) return; - if (dev->sdev->can_power_off) { + if (dev->class == ATA_DEV_ATAPI && dev->sdev->can_power_off) { device_set_run_wake(&dev->sdev->sdev_gendev, false); acpi_remove_notify_handler(handle, ACPI_SYSTEM_NOTIFY, ata_acpi_wake_dev); @@ -1130,14 +1130,23 @@ static int ata_acpi_bind_device(struct ata_port *ap, struct scsi_device *sdev, /* * If firmware has _PS3 or _PR3 for this device, -* and this ata ODD device support device attention, -* it means this device can be powered off +* it means this device can be powered off runtime */ states = acpi_dev->power.states; - if ((states[ACPI_STATE_D3_HOT].flags.valid || - states[ACPI_STATE_D3_COLD].flags.explicit_set) && - ata_dev->flags & ATA_DFLAG_DA) - sdev->can_power_off = 1; + if (states[ACPI_STATE_D3_HOT].flags.valid || + states[ACPI_STATE_D3_COLD].flags.explicit_set) { + /* +* For ODD, it needs to support device attention or +* it can't be powered up back by user +*/ + if (ata_dev->class == ATA_DEV_ATAPI && + ata_dev->flags & ATA_DFLAG_DA) + sdev->can_power_off = 1; + + /* No requirement for hard disk */ + if (ata_dev->class == ATA_DEV_ATA) + sdev->can_power_off = 1; + } return 0; } -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 5/7] scsi: pm: add may_power_off flag
From: Aaron Lu Add a new flag may_power_off for scsi device, it gives the user a chance to control when the device is runtime suspended, can we remove its power if possible. And if the device can be powered off(reflected by can_power_off flag, determined by individual driver), create a sysfs entry named may_power_off to let user control the flag. When user changes this flag through sysfs entry and if the device is already runtime suspended, runtime resume it so that it can respect this flag next time it is runtime suspended. I'm planning using this flag for sr and sd. Signed-off-by: Aaron Lu --- drivers/scsi/scsi_sysfs.c | 37 - include/scsi/scsi_device.h | 1 + 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 093d4f6..8c8efd3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -509,6 +509,7 @@ sdev_store_##field (struct device *dev, struct device_attribute *attr, \ return ret; \ } \ static DEVICE_ATTR(field, S_IRUGO | S_IWUSR, sdev_show_##field, sdev_store_##field); +#endif /* * scsi_sdev_check_buf_bit: return 0 if buf is "0", return 1 if buf is "1", @@ -526,7 +527,7 @@ static int scsi_sdev_check_buf_bit(const char *buf) } else return -EINVAL; } -#endif + /* * Create the actual show/store functions and data structures. */ @@ -860,6 +861,37 @@ static struct device_attribute sdev_attr_queue_type_rw = __ATTR(queue_type, S_IRUGO | S_IWUSR, show_queue_type_field, sdev_store_queue_type_rw); +static ssize_t +sdev_show_may_power_off(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct scsi_device *sdev; + sdev = to_scsi_device(dev); + return snprintf (buf, 20, "%d\n", sdev->may_power_off); +} + +static ssize_t +sdev_store_may_power_off(struct device *dev, struct device_attribute *attr, +const char *buf, size_t count) +{ + int ret; + struct scsi_device *sdev; + + ret = scsi_sdev_check_buf_bit(buf); + if (ret >= 0) { + sdev = to_scsi_device(dev); + if (sdev->may_power_off != ret) { + sdev->may_power_off = ret; + if (pm_runtime_suspended(dev)) + pm_runtime_resume(dev); + } + ret = count; + } + return ret; +} +static DEVICE_ATTR(may_power_off, S_IRUGO | S_IWUSR, + sdev_show_may_power_off, sdev_store_may_power_off); + /** * scsi_sysfs_add_sdev - add scsi device to sysfs * @sdev: scsi_device to add @@ -950,6 +982,9 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev) } } + if (sdev->can_power_off) + device_create_file(&sdev->sdev_gendev, &dev_attr_may_power_off); + return error; } diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 72d946f..41cec7f 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -157,6 +157,7 @@ struct scsi_device { unsigned can_power_off:1; /* Device supports runtime power off */ unsigned wce_default_on:1; /* Cache is ON by default */ unsigned wakeup_by_user:1; /* User wakes up the device */ + unsigned may_power_off:1; /* Power off is allowed by user */ DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */ struct list_head event_list;/* asserted events */ -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 6/7] scsi: sr: use may_power_off
From: Aaron Lu With the introduction of may_power_off, when deciding if we can block events checking, may_power_off should be used. Signed-off-by: Aaron Lu --- drivers/scsi/sr.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index a5bb687..291f38c 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -204,7 +204,7 @@ static int sr_suspend(struct device *dev, pm_message_t msg) if (!suspend) return -EBUSY; - if (cd->device->can_power_off) { + if (cd->device->may_power_off) { cd->disk_events_blocked = 1; disk_block_events(cd->disk); } @@ -794,6 +794,9 @@ static int sr_probe(struct device *dev) sdev_printk(KERN_DEBUG, sdev, "Attached scsi CD-ROM %s\n", cd->cdi.name); + if (cd->device->can_power_off) + cd->device->may_power_off = 1; + /* enable runtime pm */ scsi_autopm_put_device(cd->device); -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 7/7] libata: acpi: respect may_power_off flag
From: Aaron Lu If user does not want the device being powered off when runtime suspended by setting may_power_off flag to 0, we will not choose D3 cold ACPI D-State for it. Signed-off-by: Aaron Lu --- drivers/ata/libata-acpi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c index 6c8f89c..774180d 100644 --- a/drivers/ata/libata-acpi.c +++ b/drivers/ata/libata-acpi.c @@ -869,7 +869,9 @@ void ata_acpi_set_state(struct ata_port *ap, pm_message_t state) if (state.event != PM_EVENT_ON) { acpi_state = acpi_pm_device_sleep_state( - &dev->sdev->sdev_gendev, NULL, ACPI_STATE_D3); + &dev->sdev->sdev_gendev, NULL, + dev->sdev->may_power_off ? + ACPI_STATE_D3_COLD : ACPI_STATE_D3_HOT); if (acpi_state > 0) acpi_bus_set_power(handle, acpi_state); /* TBD: need to check if it's runtime pm request */ -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 16:21, Michael S. Tsirkin ha scritto: > > One reason is that, even though in practice I expect roughly the same > > number of targets and VCPUs, hotplug means the number of targets is > > difficult to predict and is usually fixed to 256. > > > > The other reason is that per-target vq didn't give any performance > > advantage. The bonus comes from cache locality and less process > > migrations, more than from the independent virtqueues. > > Okay, and why is per-target worse for cache locality? Because per-target doesn't have IRQ affinity for a particular CPU. Assuming that the thread that is sending requests to the device is I/O-bound, it is likely to be sleeping at the time the ISR is executed, and thus executing the ISR on the same processor that sent the requests is cheap. But if you have many such I/O-bound processes, the kernel will execute the ISR on a random processor, rather than the one that is sending requests to the device. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 04:30:35PM +0200, Paolo Bonzini wrote: > Il 04/09/2012 16:21, Michael S. Tsirkin ha scritto: > > > One reason is that, even though in practice I expect roughly the same > > > number of targets and VCPUs, hotplug means the number of targets is > > > difficult to predict and is usually fixed to 256. > > > > > > The other reason is that per-target vq didn't give any performance > > > advantage. The bonus comes from cache locality and less process > > > migrations, more than from the independent virtqueues. > > > > Okay, and why is per-target worse for cache locality? > > Because per-target doesn't have IRQ affinity for a particular CPU. > > Assuming that the thread that is sending requests to the device is > I/O-bound, it is likely to be sleeping at the time the ISR is executed, > and thus executing the ISR on the same processor that sent the requests > is cheap. > > But if you have many such I/O-bound processes, the kernel will execute > the ISR on a random processor, rather than the one that is sending > requests to the device. > > Paolo I see, another case where our irq balancing makes bad decisions. You could do it differently - pin irq to the cpu of the last task that executed, tweak irq affinity when that changes. Still if you want to support 256 targets vector per target is not going to work. Would be nice to add this motivation to commit log I think. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Aug 28, 2012 at 01:54:17PM +0200, Paolo Bonzini wrote: > @@ -575,15 +630,19 @@ static struct scsi_host_template virtscsi_host_template > = { > &__val, sizeof(__val)); \ > }) > > + Pls don't add empty lines. > static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq, > - struct virtqueue *vq) > + struct virtqueue *vq, bool affinity) > { > spin_lock_init(&virtscsi_vq->vq_lock); > virtscsi_vq->vq = vq; > + if (affinity) > + virtqueue_set_affinity(vq, virtqueue_get_queue_index(vq) - > +VIRTIO_SCSI_VQ_BASE); > } > This means in practice if you have less virtqueues than CPUs, things are not going to work well, will they? Any idea what to do? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 16:47, Michael S. Tsirkin ha scritto: >> > static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq, >> > - struct virtqueue *vq) >> > + struct virtqueue *vq, bool affinity) >> > { >> >spin_lock_init(&virtscsi_vq->vq_lock); >> >virtscsi_vq->vq = vq; >> > + if (affinity) >> > + virtqueue_set_affinity(vq, virtqueue_get_queue_index(vq) - >> > + VIRTIO_SCSI_VQ_BASE); >> > } >> > > This means in practice if you have less virtqueues than CPUs, > things are not going to work well, will they? Not particularly. It could be better or worse than single queue depending on the workload. > Any idea what to do? Two possibilities: 1) Add a stride argument to virtqueue_set_affinity, and make it equal to the number of queues. 2) Make multiqueue the default in QEMU, and make the default number of queues equal to the number of VCPUs. I was going for (2). Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, Sep 04, 2012 at 04:55:56PM +0200, Paolo Bonzini wrote: > Il 04/09/2012 16:47, Michael S. Tsirkin ha scritto: > >> > static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq, > >> > - struct virtqueue *vq) > >> > + struct virtqueue *vq, bool affinity) > >> > { > >> > spin_lock_init(&virtscsi_vq->vq_lock); > >> > virtscsi_vq->vq = vq; > >> > +if (affinity) > >> > +virtqueue_set_affinity(vq, > >> > virtqueue_get_queue_index(vq) - > >> > + VIRTIO_SCSI_VQ_BASE); > >> > } > >> > > > This means in practice if you have less virtqueues than CPUs, > > things are not going to work well, will they? > > Not particularly. It could be better or worse than single queue > depending on the workload. Well interrupts will go to CPU different from the one that sends commands so ... > > Any idea what to do? > > Two possibilities: > > 1) Add a stride argument to virtqueue_set_affinity, and make it equal to > the number of queues. > > 2) Make multiqueue the default in QEMU, and make the default number of > queues equal to the number of VCPUs. > > I was going for (2). > > Paolo 3. use per target queue if less targets than cpus? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 1/7] scsi: sr: support runtime pm for ODD
On Tuesday 04 September 2012 22:24:34 Aaron Lu wrote: > From: Aaron Lu > > The ODD will be placed into suspend state when: > 1 For tray type ODD, no media inside and door closed; > 2 For slot type ODD, no media inside; > And together with ACPI, when we suspend the ODD's parent(the port it > attached to), we will omit the power altogether to reduce power > consumption(done in libata-acpi.c). Well, this is quite a layering violation. You encode that specific requirement in the generic sr_suspend() > The ODD can be resumed either by user or by software. > > For user to resume the suspended ODD: > 1 For tray type ODD, press the eject button; > 2 For slot type ODD, insert a disc; > Once such events happened, an ACPI notification will be sent and in our > handler, we will resume the ODD(again in libata-acpi.c). > This kind of resume requires platform and device support and is > reflected by the can_power_off flag. > > For software to resume the suspended ODD, we did this in ODD's > open/release and check_event function. > > Signed-off-by: Aaron Lu > --- > drivers/ata/libata-acpi.c | 6 ++-- > drivers/scsi/sr.c | 76 > -- > include/scsi/scsi_device.h | 1 + > 3 files changed, 79 insertions(+), 4 deletions(-) > > diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c > index 902b5a4..64ba43d 100644 > --- a/drivers/ata/libata-acpi.c > +++ b/drivers/ata/libata-acpi.c > @@ -985,8 +985,10 @@ static void ata_acpi_wake_dev(acpi_handle handle, u32 > event, void *context) > struct ata_device *ata_dev = context; > > if (event == ACPI_NOTIFY_DEVICE_WAKE && ata_dev && > - pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) > - scsi_autopm_get_device(ata_dev->sdev); > + pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) { > + ata_dev->sdev->wakeup_by_user = 1; That flag is badly named. Something like "insert_event_during_suspend" would be better. [..] > +static int sr_suspend(struct device *dev, pm_message_t msg) > +{ > + int suspend; > + struct scsi_sense_hdr sshdr; > + struct scsi_cd *cd = dev_get_drvdata(dev); > + > + /* no action for system pm */ > + if (!PMSG_IS_AUTO(msg)) > + return 0; > + > + /* > + * ODD can be runtime suspended when: > + * tray type: no media inside and tray closed > + * slot type: no media inside > + */ > + scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); > + > + if (cd->cdi.mask & CDC_CLOSE_TRAY) > + /* no media for caddy/slot type ODD */ > + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a; > + else > + /* no media and door closed for tray type ODD */ > + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a && > + sshdr.ascq == 0x01; That requirement is valid for a specific type of disk. You cannot put it into generic sd_suspend(). And even so I don't see why you wouldn't want to suspend for example a drive with an inserted but unopened disk. > + > + if (!suspend) > + return -EBUSY; > + > + return 0; > +} Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 5/7] scsi: pm: add may_power_off flag
On Tuesday 04 September 2012 22:24:38 Aaron Lu wrote: > From: Aaron Lu > > Add a new flag may_power_off for scsi device, it gives the user a chance > to control when the device is runtime suspended, can we remove its power > if possible. > > And if the device can be powered off(reflected by can_power_off flag, > determined by individual driver), create a sysfs entry named > may_power_off to let user control the flag. Do we really need yet another interface to user space? Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] scsi_dh_rdac : minor return fix for rdac
Minor return fix. Signed-off-by: Babu Moger --- --- linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c.orig 2012-09-04 12:13:40.0 -0500 +++ linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c2012-09-04 12:15:26.0 -0500 @@ -863,7 +863,7 @@ static int rdac_bus_attach(struct scsi_d if (!scsi_dh_data) { sdev_printk(KERN_ERR, sdev, "%s: Attach failed\n", RDAC_NAME); - return 0; + return -ENOMEM; } scsi_dh_data->scsi_dh = &rdac_dh; -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] scsi_dh_rdac : Add a new netapp vendor/product string
This patch adds a new vendor/product strings for netapp E series product. Also consolidated the strings together with similar names. Signed-off-by: Babu Moger --- --- linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c.orig 2012-09-04 11:55:54.0 -0500 +++ linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c2012-09-04 11:56:30.0 -0500 @@ -795,24 +795,25 @@ static const struct scsi_dh_devlist rdac {"SGI", "TP9700"}, {"SGI", "IS"}, {"STK", "OPENstorage D280"}, + {"STK", "FLEXLINE 380"}, + {"SUN", "CSM100_R_FC"}, {"SUN", "CSM200_R"}, {"SUN", "LCSM100_I"}, {"SUN", "LCSM100_S"}, {"SUN", "LCSM100_E"}, {"SUN", "LCSM100_F"}, + {"SUN", "STK6580_6780"}, + {"SUN", "SUN_6180"}, + {"SUN", "ArrayStorage"}, {"DELL", "MD3000"}, {"DELL", "MD3000i"}, {"DELL", "MD32xx"}, {"DELL", "MD32xxi"}, {"DELL", "MD36xxi"}, {"DELL", "MD36xxf"}, + {"NETAPP", "INF-01-00"}, {"LSI", "INF-01-00"}, {"ENGENIO", "INF-01-00"}, - {"STK", "FLEXLINE 380"}, - {"SUN", "CSM100_R_FC"}, - {"SUN", "STK6580_6780"}, - {"SUN", "SUN_6180"}, - {"SUN", "ArrayStorage"}, {NULL, NULL}, }; -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] scsi_dh_rdac : Consolidate rdac strings together
This patch consolidates the strings together. Purpose is to remove minor product strings extensions. That way the future products with similar strings should not require change here. Signed-off-by: Babu Moger --- --- linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c.orig 2012-09-04 12:01:15.0 -0500 +++ linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c2012-09-04 12:01:28.0 -0500 @@ -790,27 +790,16 @@ static const struct scsi_dh_devlist rdac {"IBM", "1815"}, {"IBM", "1818"}, {"IBM", "3526"}, - {"SGI", "TP9400"}, - {"SGI", "TP9500"}, - {"SGI", "TP9700"}, + {"SGI", "TP9"}, {"SGI", "IS"}, {"STK", "OPENstorage D280"}, {"STK", "FLEXLINE 380"}, - {"SUN", "CSM100_R_FC"}, - {"SUN", "CSM200_R"}, - {"SUN", "LCSM100_I"}, - {"SUN", "LCSM100_S"}, - {"SUN", "LCSM100_E"}, - {"SUN", "LCSM100_F"}, + {"SUN", "CSM"}, + {"SUN", "LCSM100"}, {"SUN", "STK6580_6780"}, {"SUN", "SUN_6180"}, {"SUN", "ArrayStorage"}, - {"DELL", "MD3000"}, - {"DELL", "MD3000i"}, - {"DELL", "MD32xx"}, - {"DELL", "MD32xxi"}, - {"DELL", "MD36xxi"}, - {"DELL", "MD36xxf"}, + {"DELL", "MD3"}, {"NETAPP", "INF-01-00"}, {"LSI", "INF-01-00"}, {"ENGENIO", "INF-01-00"}, -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_dh_rdac : minor return fix for rdac
Resending with correct subject line. This is patch 1/1. Minor return fix. Signed-off-by: Babu Moger --- --- linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c.orig 2012-09-04 12:13:40.0 -0500 +++ linux-3.6-rc1/drivers/scsi/device_handler/scsi_dh_rdac.c2012-09-04 12:15:26.0 -0500 @@ -863,7 +863,7 @@ static int rdac_bus_attach(struct scsi_d if (!scsi_dh_data) { sdev_printk(KERN_ERR, sdev, "%s: Attach failed\n", RDAC_NAME); - return 0; + return -ENOMEM; } scsi_dh_data->scsi_dh = &rdac_dh; -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 1/7] scsi: sr: support runtime pm for ODD
On Tue, 4 Sep 2012, Oliver Neukum wrote: > On Tuesday 04 September 2012 22:24:34 Aaron Lu wrote: > > From: Aaron Lu > > > > The ODD will be placed into suspend state when: > > 1 For tray type ODD, no media inside and door closed; > > 2 For slot type ODD, no media inside; > > And together with ACPI, when we suspend the ODD's parent(the port it > > attached to), we will omit the power altogether to reduce power > > consumption(done in libata-acpi.c). > > Well, this is quite a layering violation. You encode that specific requirement > in the generic sr_suspend() What requirement are you talking about? The "no media and door closed" thing? How is that a layering violation? Are you suggesting this should go into the CD layer instead? > > @@ -985,8 +985,10 @@ static void ata_acpi_wake_dev(acpi_handle handle, u32 > > event, void *context) > > struct ata_device *ata_dev = context; > > > > if (event == ACPI_NOTIFY_DEVICE_WAKE && ata_dev && > > - pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) > > - scsi_autopm_get_device(ata_dev->sdev); > > + pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) { > > + ata_dev->sdev->wakeup_by_user = 1; > > That flag is badly named. Something like "insert_event_during_suspend" > would be better. What happens on non-ACPI systems when a new disc is inserted into a suspended ODD? How does the drive let the computer know that an insert event has occurred? > > + if (cd->cdi.mask & CDC_CLOSE_TRAY) > > + /* no media for caddy/slot type ODD */ > > + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a; > > + else > > + /* no media and door closed for tray type ODD */ > > + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a && > > + sshdr.ascq == 0x01; > > That requirement is valid for a specific type of disk. You cannot put it > into generic sd_suspend(). You mean sr_suspend(). What's not generic about it? > And even so I don't see why you wouldn't > want to suspend for example a drive with an inserted but unopened disk. I assume that Aaron wanted to handle the easiest case first. Adding suspend/resume handling to the open/close routines can be done later. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 1/7] scsi: sr: support runtime pm for ODD
On Tuesday 04 September 2012 13:59:38 Alan Stern wrote: > On Tue, 4 Sep 2012, Oliver Neukum wrote: > > > On Tuesday 04 September 2012 22:24:34 Aaron Lu wrote: > > > From: Aaron Lu > > > > > > The ODD will be placed into suspend state when: > > > 1 For tray type ODD, no media inside and door closed; > > > 2 For slot type ODD, no media inside; > > > And together with ACPI, when we suspend the ODD's parent(the port it > > > attached to), we will omit the power altogether to reduce power > > > consumption(done in libata-acpi.c). > > > > Well, this is quite a layering violation. You encode that specific > > requirement > > in the generic sr_suspend() > > What requirement are you talking about? The "no media and door closed" > thing? How is that a layering violation? Are you suggesting this There is no reason this requirement should apply to any other drive than the device this is aimed at. It comes from the ability of this specific combination to detect medium changes. > should go into the CD layer instead? No. It needs to be specific to a certain hardware. It needs to be a callback. > > > @@ -985,8 +985,10 @@ static void ata_acpi_wake_dev(acpi_handle handle, > > > u32 event, void *context) > > > struct ata_device *ata_dev = context; > > > > > > if (event == ACPI_NOTIFY_DEVICE_WAKE && ata_dev && > > > - pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) > > > - scsi_autopm_get_device(ata_dev->sdev); > > > + pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) { > > > + ata_dev->sdev->wakeup_by_user = 1; > > > > That flag is badly named. Something like "insert_event_during_suspend" > > would be better. > > What happens on non-ACPI systems when a new disc is inserted into a > suspended ODD? How does the drive let the computer know that an insert > event has occurred? Good question. Again either the kernel polls the drive or there a mechanism specific to the hardware. > > > + if (cd->cdi.mask & CDC_CLOSE_TRAY) > > > + /* no media for caddy/slot type ODD */ > > > + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a; > > > + else > > > + /* no media and door closed for tray type ODD */ > > > + suspend = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a && > > > + sshdr.ascq == 0x01; > > > > That requirement is valid for a specific type of disk. You cannot put it > > into generic sd_suspend(). > > You mean sr_suspend(). What's not generic about it? Yes. We may encounter drives which cannot register a medium change while suspended, but can be safely suspended while their door is locked. > > And even so I don't see why you wouldn't > > want to suspend for example a drive with an inserted but unopened disk. > > I assume that Aaron wanted to handle the easiest case first. Adding > suspend/resume handling to the open/close routines can be done later. Sure, but this patch blocks that in contrast to just not implementing it. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
continuous 'Test WP failed, assume Write Enabled' errors
Hi, That's something that has been bothering me for a while but I hadn't got to writing a question about it ... I can see a stream of the following errors on my desktop [233755.873770] sd 6:0:0:3: [sde] Test WP failed, assume Write Enabled [233755.886450] sd 6:0:0:3: [sde] Asking for cache data failed [233755.886455] sd 6:0:0:3: [sde] Assuming drive cache: write through [233807.297205] sd 6:0:0:0: [sdb] Test WP failed, assume Write Enabled [233807.309884] sd 6:0:0:0: [sdb] Asking for cache data failed [233807.309890] sd 6:0:0:0: [sdb] Assuming drive cache: write through [233859.059886] sd 6:0:0:3: [sde] Test WP failed, assume Write Enabled [233859.072563] sd 6:0:0:3: [sde] Asking for cache data failed [233859.072569] sd 6:0:0:3: [sde] Assuming drive cache: write through [233910.823178] sd 6:0:0:4: [sdf] Test WP failed, assume Write Enabled (continues infinitely) and also possibly related error when coming out of suspend state [233714.179445] ata1: link is slow to respond, please be patient (ready=0) [233718.792644] ata1: COMRESET failed (errno=-16) [233719.821109] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [233719.961354] ata1.00: configured for UDMA/133 [233719.980924] sd 0:0:0:0: [sda] Starting disk [233719.988158] PM: resume of devices complete after 11241.539 msecs [233719.988206] PM: resume devices took 11.240 seconds [233719.988207] [ cut here ] [233719.988211] WARNING: at kernel/power/suspend_test.c:53 suspend_test_finish+0x86/0x90() [233719.988212] Hardware name: Aspire M3920 [233719.988213] Component: resume devices, time: 11240 I'm currently running Ubuntu 12.04 using an upstream kernel 3.5. I'm not familiar with the scsi subsystem at all, but I can help anyone familiar with such problems with debugging - and hopefully solving - this problem. thanks Marcin -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
On Tue, 2012-09-04 at 08:46 +0200, Paolo Bonzini wrote: > Il 04/09/2012 04:21, Nicholas A. Bellinger ha scritto: > >> @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi > >> *vscsi, void *buf) > >>struct virtio_scsi_cmd *cmd = buf; > >>struct scsi_cmnd *sc = cmd->sc; > >>struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd; > >> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > >> + > >> + atomic_dec(&tgt->reqs); > >> > > > > As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the > > atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o > > smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h > > accessors properly, no..? > > No, only a single "thing" is being accessed, and there is no need to > order the decrement with respect to preceding or subsequent accesses to > other locations. > > In other words, tgt->reqs is already synchronized with itself, and that > is enough. > > (Besides, on x86 smp_mb__after_atomic_dec is a nop). > So the implementation detail wrt to requests to the same target being processed in FIFO ordering + only being able to change the queue when no requests are pending helps understand this code more. Thanks for the explanation on that bit.. However, it's still my understanding that the use of atomic_dec() in the completion path mean that smp_mb__after_atomic_dec() is a requirement to be proper portable atomic.hcode, no..? Otherwise tgt->regs should be using something other than an atomic_t, right..? > >> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh, > >> + struct scsi_cmnd *sc) > >> +{ > >> + struct virtio_scsi *vscsi = shost_priv(sh); > >> + struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id]; > >> + unsigned long flags; > >> + u32 queue_num; > >> + > >> + /* Using an atomic_t for tgt->reqs lets the virtqueue handler > >> + * decrement it without taking the spinlock. > >> + */ > >> + spin_lock_irqsave(&tgt->tgt_lock, flags); > >> + if (atomic_inc_return(&tgt->reqs) == 1) { > >> + queue_num = smp_processor_id(); > >> + while (unlikely(queue_num >= vscsi->num_queues)) > >> + queue_num -= vscsi->num_queues; > >> + tgt->req_vq = &vscsi->req_vqs[queue_num]; > >> + } > >> + spin_unlock_irqrestore(&tgt->tgt_lock, flags); > >> + return virtscsi_queuecommand(vscsi, tgt, sc); > >> +} > >> + > > > > The extra memory barriers to get this right for the current approach are > > just going to slow things down even more for virtio-scsi-mq.. > > virtio-scsi multiqueue has a performance benefit up to 20% (for a single > LUN) or 40% (on overall bandwidth across multiple LUNs). I doubt that a > single memory barrier can have that much impact. :) > I've no doubt that this series increases the large block high bandwidth for virtio-scsi, but historically that has always been the easier workload to scale. ;) > The way to go to improve performance even more is to add new virtio APIs > for finer control of the usage of the ring. These should let us avoid > copying the sg list and almost get rid of the tgt_lock; even though the > locking is quite efficient in virtio-scsi (see how tgt_lock and vq_lock > are "pipelined" so as to overlap the preparation of two requests), it > should give a nice improvement and especially avoid a kmalloc with small > requests. I may have some time for it next month. > > > Jen's approach is what we will ultimately need to re-architect in SCSI > > core if we're ever going to move beyond the issues of legacy host_lock, > > so I'm wondering if maybe this is the direction that virtio-scsi-mq > > needs to go in as well..? > > We can see after the block layer multiqueue work goes in... I also need > to look more closely at Jens's changes. > Yes, I think Jen's new approach is providing some pretty significant gains for raw block drivers with extremly high packet (small block random I/O) workloads, esp with hw block drivers that support genuine mq with hw num_queues > 1. He also has virtio-blk converted to run in num_queues=1 mode. > Have you measured the host_lock to be a bottleneck in high-iops > benchmarks, even for a modern driver that does not hold it in > queuecommand? (Certainly it will become more important as the > virtio-scsi queuecommand becomes thinner and thinner). This is exactly why it would make such a good vehicle to re-architect SCSI core. I'm thinking it can be the first sw LLD we attempt to get running on an (currently) future scsi-mq prototype. > If so, we can > start looking at limiting host_lock usage in the fast path. > That would be a good incremental step for SCSI core, but I'm not sure that that we'll be able to scale compared to blk-mq without a new-approach for sw/hw LLDs along the lines of what Jen's is doing. > BTW, supporting this in tcm-vhost should be quite trivial, as all the > request queues a
Re: [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
On Tue, Sep 4, 2012 at 7:02 AM, John Drescher wrote: > On Wed, Aug 29, 2012 at 10:59 AM, Dan Williams wrote: >> On Wed, 2012-08-29 at 06:50 +, Bart Van Assche wrote: >>> On 08/29/12 05:12, Dan Williams wrote: >>> > John reports: >>> > BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202] >>> > [..] >>> > Call Trace: >>> > [] scsi_remove_target+0xda/0x1f0 >>> > [] sas_rphy_remove+0x55/0x60 >>> > [] sas_rphy_delete+0x11/0x20 >>> > [] sas_port_delete+0x25/0x160 >>> > [] mptsas_del_end_device+0x183/0x270 >>> > >>> > ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan >>> > race". >>> >>> Including that call stack in the patch description may create the >>> misleading impression that this only occurs with the mptsas driver. This >>> lockup also happens with at least the iSCSI initiator. See also >>> http://lkml.org/lkml/2012/8/24/340. >> >> I don't think it does that. The title is pretty generic, but you're >> right the impact is potentially all scsi_remove_target() users. >> >>> By the way, in order to get a patch in the stable tree the proper "Cc:" >>> tag should be added in the patch description but the >>> sta...@vger.kernel.org e-mail address should be left out from the >>> Cc-list of the e-mail with the patch. >> >> No, we talked about that at kernel summit. It's ok for the stable@ >> alias to get a few extra mails. The patch won't be applied until it >> hits mainline and in the meantime it gives a heads up to the -stable >> folks, or anyone that wants to follow up on stable patches making their >> way to mainline. >> > > It appears that this did not get into 3.6 rc4 (unless I am reading the > changlog wrong). Do I have to file an official bug report to get this > noticed? I know James was travelling last week, and I think he wanted some more time to look over the removal of the restart logic... but I think we are ok. Prior to commit 3b661a9 we had get_device(dev); device_for_each_child(dev, NULL, __remove_child); put_device(dev); Where that device_for_each_child would not restart, but also would not consider stargets that were not yet proper children of dev. With the fix we find all stargets and make sure to be immune to the regression case where scsi_remove_target() is not the final source of scsi_target_reap(). -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [SCSI] bfa: use list_move_tail instead of list_del/list_add_tail
From: Wei Yongjun Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun --- drivers/scsi/bfa/bfa_ioc.c | 3 +-- drivers/scsi/bfa/bfa_fcpim.c | 37 + 2 files changed, 14 insertions(+), 26 deletions(-) diff --git a/drivers/scsi/bfa/bfa_ioc.c b/drivers/scsi/bfa/bfa_ioc.c index 8cdb79c..339e731 100644 --- a/drivers/scsi/bfa/bfa_ioc.c +++ b/drivers/scsi/bfa/bfa_ioc.c @@ -2950,8 +2950,7 @@ bfa_timer_beat(struct bfa_timer_mod_s *mod) elem = (struct bfa_timer_s *) qe; if (elem->timeout <= BFA_TIMER_FREQ) { elem->timeout = 0; - list_del(&elem->qe); - list_add_tail(&elem->qe, &timedout_q); + list_move_tail(&elem->qe, &timedout_q); } else { elem->timeout -= BFA_TIMER_FREQ; } diff --git a/drivers/scsi/bfa/bfa_fcpim.c b/drivers/scsi/bfa/bfa_fcpim.c index f0f80e2..02a1552 100644 --- a/drivers/scsi/bfa/bfa_fcpim.c +++ b/drivers/scsi/bfa/bfa_fcpim.c @@ -91,8 +91,7 @@ enum bfa_itnim_event { * BFA IOIM related definitions */ #define bfa_ioim_move_to_comp_q(__ioim) do { \ - list_del(&(__ioim)->qe);\ - list_add_tail(&(__ioim)->qe, &(__ioim)->fcpim->ioim_comp_q);\ + list_move_tail(&(__ioim)->qe, &(__ioim)->fcpim->ioim_comp_q); \ } while (0) @@ -1030,8 +1029,7 @@ bfa_itnim_cleanup(struct bfa_itnim_s *itnim) * Move IO to a cleanup queue from active queue so that a later * TM will not pickup this IO. */ - list_del(&ioim->qe); - list_add_tail(&ioim->qe, &itnim->io_cleanup_q); + list_move_tail(&ioim->qe, &itnim->io_cleanup_q); bfa_wc_up(&itnim->wc); bfa_ioim_cleanup(ioim); @@ -1505,15 +1503,13 @@ bfa_ioim_sm_uninit(struct bfa_ioim_s *ioim, enum bfa_ioim_event event) if (!bfa_itnim_is_online(ioim->itnim)) { if (!bfa_itnim_hold_io(ioim->itnim)) { bfa_sm_set_state(ioim, bfa_ioim_sm_hcb); - list_del(&ioim->qe); - list_add_tail(&ioim->qe, - &ioim->fcpim->ioim_comp_q); + list_move_tail(&ioim->qe, + &ioim->fcpim->ioim_comp_q); bfa_cb_queue(ioim->bfa, &ioim->hcb_qe, __bfa_cb_ioim_pathtov, ioim); } else { - list_del(&ioim->qe); - list_add_tail(&ioim->qe, - &ioim->itnim->pending_q); + list_move_tail(&ioim->qe, + &ioim->itnim->pending_q); } break; } @@ -2040,8 +2036,7 @@ bfa_ioim_sm_hcb_free(struct bfa_ioim_s *ioim, enum bfa_ioim_event event) switch (event) { case BFA_IOIM_SM_HCB: bfa_sm_set_state(ioim, bfa_ioim_sm_resfree); - list_del(&ioim->qe); - list_add_tail(&ioim->qe, &ioim->fcpim->ioim_resfree_q); + list_move_tail(&ioim->qe, &ioim->fcpim->ioim_resfree_q); break; case BFA_IOIM_SM_FREE: @@ -2673,14 +2668,12 @@ bfa_ioim_notify_cleanup(struct bfa_ioim_s *ioim) * Move IO from itnim queue to fcpim global queue since itnim will be * freed. */ - list_del(&ioim->qe); - list_add_tail(&ioim->qe, &ioim->fcpim->ioim_comp_q); + list_move_tail(&ioim->qe, &ioim->fcpim->ioim_comp_q); if (!ioim->iosp->tskim) { if (ioim->fcpim->delay_comp && ioim->itnim->iotov_active) { bfa_cb_dequeue(&ioim->hcb_qe); - list_del(&ioim->qe); - list_add_tail(&ioim->qe, &ioim->itnim->delay_comp_q); + list_move_tail(&ioim->qe, &ioim->itnim->delay_comp_q); } bfa_itnim_iodone(ioim->itnim); } else @@ -2724,8 +2717,7 @@ bfa_ioim_delayed_comp(struct bfa_ioim_s *ioim, bfa_boolean_t iotov) * Move IO to fcpim global queue since itnim will be * freed. */ - list_del(&ioim->qe); - list_add_tail(&ioim->qe, &ioim->fcpim->ioim_comp_q); + list_move_tail(&ioim->qe, &ioim->fcpim->ioim_comp_q); } @@ -3319,8 +3311,7 @@ bfa_tskim_gather_ios(struct bfa_tskim_s *tskim) cmnd = (struct scsi_cmnd *) ioim->dio; int_to_scsilun(cmnd->device->lun, &scsilun); if (bfa_ts
[PATCH] target: use list_move_tail instead of list_del/list_add_tail
From: Wei Yongjun Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun --- drivers/target/sbp/sbp_target.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c index 39ddba5..e57971f 100644 --- a/drivers/target/sbp/sbp_target.c +++ b/drivers/target/sbp/sbp_target.c @@ -660,8 +660,7 @@ static void session_reconnect_expired(struct sbp_session *sess) spin_lock_bh(&sess->lock); list_for_each_entry_safe(login, temp, &sess->login_list, link) { login->sess = NULL; - list_del(&login->link); - list_add_tail(&login->link, &login_list); + list_move_tail(&login->link, &login_list); } spin_unlock_bh(&sess->lock); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [SCSI] lpfc 8.3.32: use list_move_tail instead of list_del/list_add_tail
From: Wei Yongjun Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun --- drivers/scsi/lpfc/lpfc_sli.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 9cbd20b..0ada66b 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -15833,8 +15833,7 @@ lpfc_cleanup_pending_mbox(struct lpfc_vport *vport) (mb->u.mb.mbxCommand != MBX_REG_VPI)) continue; - list_del(&mb->list); - list_add_tail(&mb->list, &mbox_cmd_list); + list_move_tail(&mb->list, &mbox_cmd_list); } /* Clean up active mailbox command with the vport */ mb = phba->sli.mbox_active; -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html