Re: dm-multipath test scripts
On 02/20/16 15:12, Mike Snitzer wrote: > On Fri, Feb 19 2016 at 2:42pm -0500, Mike Snitzer wrote: >> Have you been running with blk-mq? >> Either by setting CONFIG_DM_MQ_DEFAULT or: >> echo Y > /sys/module/dm_mod/parameters/use_blk_mq >> >> I'm seeing test_02_sdev_delete fail with blk-mq enabled. > > I only see failure if I stack dm-mq ontop of old non-mq scsi devices with: > > echo N > /sys/module/scsi_mod/parameters/use_blk_mq > echo Y > /sys/module/dm_mod/parameters/use_blk_mq Ah, I didn't test that combination. I can see the failure, too. > But this makes me think the novelty of having dm-mq support stacking on > non-blk-mq devices was misplaced. It is a senseless config. I'll > probably remove support for such stacking soon (next week). Looking at the failure, I suspect it could be a common issue of dm-mq regardless of underlying device type. When requeueing, following calls happen in dm-mq: dm_requeue_original_request() { .. blk_mq_requeue_request(rq); blk_mq_kick_requeue_list(rq->q); then from block workqueue: blk_mq_requeue_work() { .. blk_mq_start_hw_queue(q); and blk_mq_start_hw_queue() re-starts the queue even if DM has stopped it for suspending. As a result, dm-mq ends up repeating submit-error-requeue forever and suspend never completes. Or, suspend somehow proceeds to clear DMF_NOFLUSH_SUSPENDING and I/O error may directly be returned to submitter. Attached patch fixes the problem for DM. But given the code comment, there should be call sites which depend on 'start-if-stopped' behavior of blk_mq_requeue_work and we may need other solution. -- Jun'ichi Nomura, NEC Corporation diff --git a/block/blk-mq.c b/block/blk-mq.c index 56c0a72..bbfe936 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -481,11 +481,7 @@ static void blk_mq_requeue_work(struct work_struct *work) blk_mq_insert_request(rq, false, false, false); } - /* -* Use the start variant of queue running here, so that running -* the requeue work will kick stopped queues. -*/ - blk_mq_start_hw_queues(q); + blk_mq_run_hw_queues(q, false); } void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)-- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
I would like to make another point about why I am making this change in case it is not clear. The queue full events are form TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in the slot: I want the slot retried when this occurs, so I set status as SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI midlayer so we get a retry. I could use SAS_OPEN_REJECT alternatively as the error which would have the same affect. The queue full are not from all slots being consumed in the HBA. Ah, right. So you might be getting those errors even with some free slots on the HBA. As such they are roughly equivalent to a QUEUE_FULL SCSI statue, right? So after reading SPL I guess you are right here; using tags wouldn't help for this situation. Yes, I think we have 90% of slots free in the host when this occurs for one particular test - Our v2 hw has 2K slots, which is >> cmd_per_lun. The errors are equivalent to queue full for the device. Thanks, John Cheers, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 111441] iscsi fails to attach to targets
https://bugzilla.kernel.org/show_bug.cgi?id=111441 --- Comment #18 from Serguei Bezverkhi --- Hello Hannes, Thank you for your reply. I am on 4.4.2 kernel, is there any chance to commit it in 4.4 as well? If not, could you send me diff for 4.4 kernel. Best regards Serguei Serguei Bezverkhi, TECHNICAL LEADER.SERVICES Global SP Services sbezv...@cisco.com Phone: +1 416 306 7312 Mobile: +1 514 234 7374 CCIE (R&S,SP,Sec) - #9527 Cisco.com Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here for Company Registration Information. -Original Message- From: Hannes Reinecke [mailto:h...@suse.de] Sent: Monday, February 22, 2016 2:08 AM To: Serguei Bezverkhi (sbezverk) ; Mike Christie Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; Christoph Hellwig Subject: Re: [Bug 111441] New: iscsi fails to attach to targets On 02/22/2016 01:45 AM, Serguei Bezverkhi (sbezverk) wrote: > Hi Mike, > > I just wanted to follow up with you to see if the patch got committed to an > upstream kernel if yes, please let me into which version it went. > > Thank you > > Serguei > > > Serguei Bezverkhi, > TECHNICAL LEADER.SERVICES > Global SP Services > sbezv...@cisco.com > Phone: +1 416 306 7312 > Mobile: +1 514 234 7374 > > CCIE (R&S,SP,Sec) - #9527 > > Cisco.com > > > > Think before you print. > This email may contain confidential and privileged material for the sole use > of the intended recipient. Any review, use, distribution or disclosure by > others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by reply > email and delete all copies of this message. > Please click here for Company Registration Information. > > > > -Original Message- > From: Mike Christie [mailto:micha...@cs.wisc.edu] > Sent: Friday, January 29, 2016 6:33 PM > To: Serguei Bezverkhi (sbezverk) > Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; > Christoph Hellwig ; Hannes Reinecke > Subject: Re: [Bug 111441] New: iscsi fails to attach to targets > > On 01/29/2016 04:21 PM, Serguei Bezverkhi (sbezverk) wrote: >> HI Mike, >> >> I tried your patch and it is has eliminated first traceback but I still do >> not see my remote targets. >> > > That is sort of expected. Your target is not setup for ALUA properly. It says > it supports ALUA, but when scsi_dh_alua asks about the ports it is reporting > there are none. Ccing the people that made the patch that added the issue and > own the code. > > Hey Christoph and Hannes, > > The dh/alua changes that added this: > > error = scsi_dh_add_device(sdev); > if (error) { > sdev_printk(KERN_INFO, sdev, > "failed to add device handler: %d\n", error); > return error; > } > > to scsi_sysfs_add_sdev are adding a regression. > > 1. If that fails, then we forget to do device_del before doing the return. My > patch in this thread added that back, so we do not see the sysfs oopses > anymore. But. > > 2. It looks like in older kernels, we would allow misconfigured targets like > this one to still setup devices. Do we want that old behavior back? > Should we just ignore the return value from scsi_dh_add_device above? > Note that in this case, it is LIO so it can be easily fixed on the target > side by just setting it up properly. I do not think other targets would hit > this type of issue. > > This has been fixed up with my patchset to update the ALUA handler, most notably the commit 'scsi: ignore errors from scsi_dh_add_device()' which was included in 4.5. Cheers, Hannes -- You are receiving this mail because: You are the assignee for the bug.-- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Bug 111441] New: iscsi fails to attach to targets
Hello Hannes, Thank you for your reply. I am on 4.4.2 kernel, is there any chance to commit it in 4.4 as well? If not, could you send me diff for 4.4 kernel. Best regards Serguei Serguei Bezverkhi, TECHNICAL LEADER.SERVICES Global SP Services sbezv...@cisco.com Phone: +1 416 306 7312 Mobile: +1 514 234 7374 CCIE (R&S,SP,Sec) - #9527 Cisco.com Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here for Company Registration Information. -Original Message- From: Hannes Reinecke [mailto:h...@suse.de] Sent: Monday, February 22, 2016 2:08 AM To: Serguei Bezverkhi (sbezverk) ; Mike Christie Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; Christoph Hellwig Subject: Re: [Bug 111441] New: iscsi fails to attach to targets On 02/22/2016 01:45 AM, Serguei Bezverkhi (sbezverk) wrote: > Hi Mike, > > I just wanted to follow up with you to see if the patch got committed to an > upstream kernel if yes, please let me into which version it went. > > Thank you > > Serguei > > > Serguei Bezverkhi, > TECHNICAL LEADER.SERVICES > Global SP Services > sbezv...@cisco.com > Phone: +1 416 306 7312 > Mobile: +1 514 234 7374 > > CCIE (R&S,SP,Sec) - #9527 > > Cisco.com > > > > Think before you print. > This email may contain confidential and privileged material for the sole use > of the intended recipient. Any review, use, distribution or disclosure by > others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by reply > email and delete all copies of this message. > Please click here for Company Registration Information. > > > > -Original Message- > From: Mike Christie [mailto:micha...@cs.wisc.edu] > Sent: Friday, January 29, 2016 6:33 PM > To: Serguei Bezverkhi (sbezverk) > Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; > Christoph Hellwig ; Hannes Reinecke > Subject: Re: [Bug 111441] New: iscsi fails to attach to targets > > On 01/29/2016 04:21 PM, Serguei Bezverkhi (sbezverk) wrote: >> HI Mike, >> >> I tried your patch and it is has eliminated first traceback but I still do >> not see my remote targets. >> > > That is sort of expected. Your target is not setup for ALUA properly. It says > it supports ALUA, but when scsi_dh_alua asks about the ports it is reporting > there are none. Ccing the people that made the patch that added the issue and > own the code. > > Hey Christoph and Hannes, > > The dh/alua changes that added this: > > error = scsi_dh_add_device(sdev); > if (error) { > sdev_printk(KERN_INFO, sdev, > "failed to add device handler: %d\n", error); > return error; > } > > to scsi_sysfs_add_sdev are adding a regression. > > 1. If that fails, then we forget to do device_del before doing the return. My > patch in this thread added that back, so we do not see the sysfs oopses > anymore. But. > > 2. It looks like in older kernels, we would allow misconfigured targets like > this one to still setup devices. Do we want that old behavior back? > Should we just ignore the return value from scsi_dh_add_device above? > Note that in this case, it is LIO so it can be easily fixed on the target > side by just setting it up properly. I do not think other targets would hit > this type of issue. > > This has been fixed up with my patchset to update the ALUA handler, most notably the commit 'scsi: ignore errors from scsi_dh_add_device()' which was included in 4.5. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2][RESEND] scsi_transport_fc: LUN masking
Hi Hannes, How do you know that a request for an async scan is complete (I'm assuming that you get add or change udev events)? Assuming that someone has manually started a scan on something (e.g. some newly presented devices after boot) and all scans are going to be async how do you when it is complete rather than waiting in a work queue? An example may be a sysfs file that contains unscanned, pending, scanning, scanned so you know when it's complete at the appropriate level in sysfs (the hba and the rports) so you know when can continue if you're polling the status (e.g. checking as part of system admin work with newly presented rports so you can then do something with them). Thanks Shane > -Original Message- > From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- > ow...@vger.kernel.org] On Behalf Of Hannes Reinecke > Sent: Monday, February 22, 2016 6:51 PM > To: Martin K . Petersen > Cc: Christoph Hellwig; James Bottomley; Johannes Thumshirn; linux- > s...@vger.kernel.org; Hannes Reinecke > Subject: [PATCH 0/2][RESEND] scsi_transport_fc: LUN masking > > Hi all, > > having been subjected to the pain of trying to bootstrap a really large > machine with systemd I decided to implement LUN masking in > scsi_transport_fc. > The principle is simple: disallow the automated LUN scanning when > discovering a rport, and create udev rules which selectively enable individual > LUNs by echoing the relevant values in the 'scan' > attribute of the SCSI host. > With that I'm able to boot an arbitrary large machine without running into any > udev or systemd imposed timeout. > To _disable_ LUN masking and restoring the original behaviour I've noticed > that the 'scan' sysfs attribute is actually synchronous, ie the calling > process > will be blocked until the entire LUN scan is completed. > So I've added another module parameter 'async_user_scan' to move the > scanning onto the existing scan workqueue, and unblock the calling process. > > As usual, comments and reviews are welcome. > > Hannes Reinecke (2): > scsi_transport_fc: implement 'disable_target_scan' module parameter > scsi_transport_fc: Implement 'async_user_scan' module parameter > > drivers/scsi/scsi_transport_fc.c | 47 > +--- > 1 file changed, 44 insertions(+), 3 deletions(-) > > -- > 2.6.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the > body of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v2 PATCH 1/3] scsi:stex.c Support to Pegasus series.
From: Charles Pegasus is a high performace hardware RAID solution designed to unleash the raw power of Thunderbolt technology. 1. Add code to distinct SuperTrack and Pegasus series by sub device ID. It should support backward compatibility. 2. Change the driver version. Signed-off-by: Charles Chiou Reviewed-by: Johannes Thumshirn --- drivers/scsi/stex.c | 32 ++-- 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 2de28d7..495d632 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -1,7 +1,7 @@ /* * SuperTrak EX Series Storage Controller driver for Linux * - * Copyright (C) 2005-2009 Promise Technology Inc. + * Copyright (C) 2005-2015 Promise Technology Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -38,11 +38,11 @@ #include #define DRV_NAME "stex" -#define ST_DRIVER_VERSION "4.6..4" -#define ST_VER_MAJOR 4 -#define ST_VER_MINOR 6 -#define ST_OEM 0 -#define ST_BUILD_VER 4 +#define ST_DRIVER_VERSION "5.00..01" +#define ST_VER_MAJOR 5 +#define ST_VER_MINOR 00 +#define ST_OEM +#define ST_BUILD_VER 01 enum { /* MU register offset */ @@ -328,6 +328,7 @@ struct st_hba { u16 rq_count; u16 rq_size; u16 sts_count; + u8 supports_pm; }; struct st_card_info { @@ -1560,6 +1561,25 @@ static int stex_probe(struct pci_dev *pdev, const struct pci_device_id *id) hba->cardtype = (unsigned int) id->driver_data; ci = &stex_card_info[hba->cardtype]; + switch (id->subdevice) { + case 0x4221: + case 0x4222: + case 0x4223: + case 0x4224: + case 0x4225: + case 0x4226: + case 0x4227: + case 0x4261: + case 0x4262: + case 0x4263: + case 0x4264: + case 0x4265: + break; + default: + if (hba->cardtype == st_yel) + hba->supports_pm = 1; + } + sts_offset = scratch_offset = (ci->rq_count+1) * ci->rq_size; if (hba->cardtype == st_yel) sts_offset += (ci->sts_count+1) * sizeof(u32); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v2 PATCH 2/3] scsi:stex.c Add hotplug support
From: Charles 1. Add hotplug support. Pegasus support surprise removal. To this end, I use return_abnormal_state function to return DID_NO_CONNECT for all commands which sent to driver. 2. Remove stex_hba_stop in stex_remove because we cannot send command to device after hotplug. 3. Add new device status: MU_STATE_STOP, MU_STATE_NOCONNECT, MU_STATE_STOP. MU_STATE_STOP is currently not referenced. MU_STATE_NOCONNECT represent that device is plugged out from the host. 4. Use return_abnormal_function() to substitute part of code in stex_do_reset. Signed-off-by: Charles Chiou Reviewed-by: Johannes Thumshirn --- drivers/scsi/stex.c | 53 ++--- 1 file changed, 34 insertions(+), 19 deletions(-) diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 495d632..1994603 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -84,6 +84,8 @@ enum { MU_STATE_STARTED= 2, MU_STATE_RESETTING = 3, MU_STATE_FAILED = 4, + MU_STATE_STOP = 5, + MU_STATE_NOCONNECT = 6, MU_MAX_DELAY= 120, MU_HANDSHAKE_SIGNATURE = 0x5555, @@ -537,6 +539,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg *req, u16 tag) readl(hba->mmio_base + YH2I_REQ); /* flush */ } +static void return_abnormal_state(struct st_hba *hba, int status) +{ + struct st_ccb *ccb; + unsigned long flags; + u16 tag; + + spin_lock_irqsave(hba->host->host_lock, flags); + for (tag = 0; tag < hba->host->can_queue; tag++) { + ccb = &hba->ccb[tag]; + if (ccb->req == NULL) + continue; + ccb->req = NULL; + if (ccb->cmd) { + scsi_dma_unmap(ccb->cmd); + ccb->cmd->result = status << 16; + ccb->cmd->scsi_done(ccb->cmd); + ccb->cmd = NULL; + } + } + spin_unlock_irqrestore(hba->host->host_lock, flags); +} static int stex_slave_config(struct scsi_device *sdev) { @@ -560,8 +583,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *)) id = cmd->device->id; lun = cmd->device->lun; hba = (struct st_hba *) &host->hostdata[0]; - - if (unlikely(hba->mu_status == MU_STATE_RESETTING)) + if (hba->mu_status == MU_STATE_NOCONNECT) { + cmd->result = DID_NO_CONNECT; + done(cmd); + return 0; + } + if (unlikely(hba->mu_status != MU_STATE_STARTED)) return SCSI_MLQUEUE_HOST_BUSY; switch (cmd->cmnd[0]) { @@ -1260,10 +1287,8 @@ static void stex_ss_reset(struct st_hba *hba) static int stex_do_reset(struct st_hba *hba) { - struct st_ccb *ccb; unsigned long flags; unsigned int mu_status = MU_STATE_RESETTING; - u16 tag; spin_lock_irqsave(hba->host->host_lock, flags); if (hba->mu_status == MU_STATE_STARTING) { @@ -1297,20 +1322,8 @@ static int stex_do_reset(struct st_hba *hba) else if (hba->cardtype == st_yel) stex_ss_reset(hba); - spin_lock_irqsave(hba->host->host_lock, flags); - for (tag = 0; tag < hba->host->can_queue; tag++) { - ccb = &hba->ccb[tag]; - if (ccb->req == NULL) - continue; - ccb->req = NULL; - if (ccb->cmd) { - scsi_dma_unmap(ccb->cmd); - ccb->cmd->result = DID_RESET << 16; - ccb->cmd->scsi_done(ccb->cmd); - ccb->cmd = NULL; - } - } - spin_unlock_irqrestore(hba->host->host_lock, flags); + + return_abnormal_state(hba, DID_RESET); if (stex_handshake(hba) == 0) return 0; @@ -1771,9 +1784,11 @@ static void stex_remove(struct pci_dev *pdev) { struct st_hba *hba = pci_get_drvdata(pdev); + hba->mu_status = MU_STATE_NOCONNECT; + return_abnormal_state(hba, DID_NO_CONNECT); scsi_remove_host(hba->host); - stex_hba_stop(hba); + scsi_block_requests(hba->host); stex_hba_free(hba); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v2 PATCH 3/3] scsi:stex.c Add S3/S4 support
From: Charles Add S3/S4 support, add .suspend and .resume function in pci_driver. In .suspend handler, driver send S3/S4 signal to the device. Signed-off-by: Charles Chiou Reviewed-by: Johannes Thumshirn --- drivers/scsi/stex.c | 68 ++--- 1 file changed, 65 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 1994603..5b23175 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -167,6 +167,14 @@ enum { ST_ADDITIONAL_MEM = 0x20, ST_ADDITIONAL_MEM_MIN = 0x8, + PMIC_SHUTDOWN = 0x0D, + PMIC_REUMSE = 0x10, + ST_IGNORED = -1, + ST_NOTHANDLED = 7, + ST_S3 = 3, + ST_S4 = 4, + ST_S5 = 5, + ST_S6 = 6, }; struct st_sgitem { @@ -1718,7 +1726,7 @@ out_disable: return err; } -static void stex_hba_stop(struct st_hba *hba) +static void stex_hba_stop(struct st_hba *hba, int st_sleep_mic) { struct req_msg *req; struct st_msg_header *msg_h; @@ -1727,6 +1735,15 @@ static void stex_hba_stop(struct st_hba *hba) u16 tag = 0; spin_lock_irqsave(hba->host->host_lock, flags); + + if (hba->cardtype == st_yel && hba->supports_pm == 1) + { + if(st_sleep_mic == ST_NOTHANDLED) + { + spin_unlock_irqrestore(hba->host->host_lock, flags); + return; + } + } req = hba->alloc_rq(hba); if (hba->cardtype == st_yel) { msg_h = (struct st_msg_header *)req - 1; @@ -1734,11 +1751,18 @@ static void stex_hba_stop(struct st_hba *hba) } else memset(req, 0, hba->rq_size); - if (hba->cardtype == st_yosemite || hba->cardtype == st_yel) { + if ((hba->cardtype == st_yosemite || hba->cardtype == st_yel) + && st_sleep_mic == ST_IGNORED) { req->cdb[0] = MGT_CMD; req->cdb[1] = MGT_CMD_SIGNATURE; req->cdb[2] = CTLR_CONFIG_CMD; req->cdb[3] = CTLR_SHUTDOWN; + } else if (hba->cardtype == st_yel && st_sleep_mic != ST_IGNORED) { + req->cdb[0] = MGT_CMD; + req->cdb[1] = MGT_CMD_SIGNATURE; + req->cdb[2] = CTLR_CONFIG_CMD; + req->cdb[3] = PMIC_SHUTDOWN; + req->cdb[4] = st_sleep_mic; } else { req->cdb[0] = CONTROLLER_CMD; req->cdb[1] = CTLR_POWER_STATE_CHANGE; @@ -1758,10 +1782,12 @@ static void stex_hba_stop(struct st_hba *hba) while (hba->ccb[tag].req_type & PASSTHRU_REQ_TYPE) { if (time_after(jiffies, before + ST_INTERNAL_TIMEOUT * HZ)) { hba->ccb[tag].req_type = 0; + hba->mu_status = MU_STATE_STOP; return; } msleep(1); } + hba->mu_status = MU_STATE_STOP; } static void stex_hba_free(struct st_hba *hba) @@ -1801,9 +1827,43 @@ static void stex_shutdown(struct pci_dev *pdev) { struct st_hba *hba = pci_get_drvdata(pdev); - stex_hba_stop(hba); + if (hba->supports_pm == 0) + stex_hba_stop(hba, ST_IGNORED); + else + stex_hba_stop(hba, ST_S5); +} + +static int stex_choice_sleep_mic(pm_message_t state) +{ + switch (state.event) { + case PM_EVENT_SUSPEND: + return ST_S3; + case PM_EVENT_HIBERNATE: + return ST_S4; + default: + return ST_NOTHANDLED; + } } +static int stex_suspend(struct pci_dev *pdev, pm_message_t state) +{ + struct st_hba *hba = pci_get_drvdata(pdev); + + if (hba->cardtype == st_yel && hba->supports_pm == 1) + stex_hba_stop(hba, stex_choice_sleep_mic(state)); + else + stex_hba_stop(hba, ST_IGNORED); + return 0; +} + +static int stex_resume(struct pci_dev *pdev) +{ + struct st_hba *hba = pci_get_drvdata(pdev); + + hba->mu_status = MU_STATE_STARTING; + stex_handshake(hba); + return 0; +} MODULE_DEVICE_TABLE(pci, stex_pci_tbl); static struct pci_driver stex_pci_driver = { @@ -1812,6 +1872,8 @@ static struct pci_driver stex_pci_driver = { .probe = stex_probe, .remove = stex_remove, .shutdown = stex_shutdown, + .suspend= stex_suspend, + .resume = stex_resume, }; static int __init stex_init(void) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org Mo
why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]
On Mon, Feb 22 2016 at 4:51am -0500, Junichi Nomura wrote: > On 02/20/16 15:12, Mike Snitzer wrote: > > On Fri, Feb 19 2016 at 2:42pm -0500, Mike Snitzer > > wrote: > >> Have you been running with blk-mq? > >> Either by setting CONFIG_DM_MQ_DEFAULT or: > >> echo Y > /sys/module/dm_mod/parameters/use_blk_mq > >> > >> I'm seeing test_02_sdev_delete fail with blk-mq enabled. > > > > I only see failure if I stack dm-mq ontop of old non-mq scsi devices with: > > > > echo N > /sys/module/scsi_mod/parameters/use_blk_mq > > echo Y > /sys/module/dm_mod/parameters/use_blk_mq > > Ah, I didn't test that combination. I can see the failure, too. > > > But this makes me think the novelty of having dm-mq support stacking on > > non-blk-mq devices was misplaced. It is a senseless config. I'll > > probably remove support for such stacking soon (next week). > > Looking at the failure, I suspect it could be a common issue of dm-mq > regardless of underlying device type. In practice I'm not seeing any issues with dm-mq on scsi-mq. > When requeueing, following calls happen in dm-mq: > dm_requeue_original_request() { > .. > blk_mq_requeue_request(rq); > blk_mq_kick_requeue_list(rq->q); > > then from block workqueue: > blk_mq_requeue_work() { > .. > blk_mq_start_hw_queue(q); > > and blk_mq_start_hw_queue() re-starts the queue even if DM has > stopped it for suspending. As a result, dm-mq ends up repeating > submit-error-requeue forever and suspend never completes. Or, > suspend somehow proceeds to clear DMF_NOFLUSH_SUSPENDING and > I/O error may directly be returned to submitter. I should note that I applied this patch for 4.6: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6 (but it was purely a fallout of code-review, and looking at the nvme's use of blk_mq_requeue_request, I did't consider it to be a critical fix or anything) > Attached patch fixes the problem for DM. But given the code comment, > there should be call sites which depend on 'start-if-stopped' behavior > of blk_mq_requeue_work and we may need other solution. Nice catch, it certainly does seem like the blk-mq requeue code is undo-ing steps DM took to protect dm-mpath during suspend. It likely doesn't bite dm-mq on scsi-mq because in general blk-mq takes the rq->q->queue_lock much less frequently. But when stacking blk-mq on .request_fn queues it causes live-lock you detailed above. I'm not sure what the right fix is, but it would seem we need something. I cannot speak to why blk_mq_start_hw_queues() was used to begin with (or why it is important for blk-mq to forcibly kicked stopped queues on requeue). Jens? I see commit 8b95741569ea ("blk-mq: use blk_mq_start_hw_queues() when running requeue work") but I'm still missing why the upper-layer driver of the blk-mq queue (dm-mq in this case) isn't free to keep the queue stopped. This is pretty important for DM's suspend functionality. > -- > Jun'ichi Nomura, NEC Corporation > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 56c0a72..bbfe936 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -481,11 +481,7 @@ static void blk_mq_requeue_work(struct work_struct *work) > blk_mq_insert_request(rq, false, false, false); > } > > - /* > - * Use the start variant of queue running here, so that running > - * the requeue work will kick stopped queues. > - */ > - blk_mq_start_hw_queues(q); > + blk_mq_run_hw_queues(q, false); > } > > void blk_mq_add_to_requeue_list(struct request *rq, bool at_head) > > -- > dm-devel mailing list > dm-de...@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 20/23] scsi: Add 'access_state' attribute
On 02/21/16 22:59, Hannes Reinecke wrote: The main reason why I need the 'access_state' attribute is to decouple the multipath daemon; at the moment the multipath daemon has to issue REPORT TARGET PORT GROUPS frequently to figure out the status, which is causing quite some load on the target. When using the 'access_state' attribute we would avoid doing I/O for that and have a consistent view, both on the kernel and the multipath daemon side. But it's actually a good thing to have the 'access_state' patch in a different series; I've got some more patches converting the remaining device_handler to also supply the 'access_state' values. Hello Hannes, The above sounds very interesting to me. Will multipathd recognize at run-time whether or not the kernel supports the sysfs ALUA state attribute ? Will ALUA state changes be reported through udev or will multipathd poll the sysfs ALUA state attributes ? And if the netlink buffer that is used in multipathd to receive udev events overflows (ENOBUFS), will multipathd resynchronize its state ? As far as I can see in source file libmultipath/uevent.c today multipathd ignores netlink buffer overflows. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ncr5380: Don't re-enter NCR5380_select() when aborting a command
Please ignore this patch. It isn't sufficient to fix the problem. I'll send another patch that does fix it. On Tue, 26 Jan 2016, Finn Thain wrote: > Fixes: 707d62b37fbb ("ncr5380: Fix EH during arbitration and selection") > Signed-off-by: Finn Thain > > --- > drivers/scsi/NCR5380.c |2 +- > drivers/scsi/atari_NCR5380.c |2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > Index: linux/drivers/scsi/NCR5380.c > === > --- linux.orig/drivers/scsi/NCR5380.c 2016-01-26 13:31:10.0 +1100 > +++ linux/drivers/scsi/NCR5380.c 2016-01-26 13:31:10.0 +1100 > @@ -2337,7 +2337,7 @@ static int NCR5380_abort(struct scsi_cmn > dsprintk(NDEBUG_ABORT, instance, >"abort: removed %p from disconnected list\n", cmd); > cmd->result = DID_ERROR << 16; > - if (!hostdata->connected) > + if (!hostdata->connected && !hostdata->selecting) > NCR5380_select(instance, cmd); > if (hostdata->connected != cmd) { > complete_cmd(instance, cmd); > Index: linux/drivers/scsi/atari_NCR5380.c > === > --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-01-26 13:31:10.0 > +1100 > +++ linux/drivers/scsi/atari_NCR5380.c2016-01-26 13:31:10.0 > +1100 > @@ -2532,7 +2532,7 @@ static int NCR5380_abort(struct scsi_cmn > dsprintk(NDEBUG_ABORT, instance, >"abort: removed %p from disconnected list\n", cmd); > cmd->result = DID_ERROR << 16; > - if (!hostdata->connected) > + if (!hostdata->connected && !hostdata->selecting) > NCR5380_select(instance, cmd); > if (hostdata->connected != cmd) { > complete_cmd(instance, cmd); > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate
This bug causes the wrong command to have its sense pointer overwritten, which sometimes leads to a NULL pointer deref. Fix this by checking which command is being requeued before restoring the scsi_eh_save data. It turns out that some targets will disconnect a REQUEST SENSE command. The autosense algorithm doesn't anticipate this. Hence multiple commands can end up undergoing autosense simultaneously, and they will all try to use the same scsi_eh_save struct, which won't work. Defer autosense when the scsi_eh_save storage is in use by another command. Fixes: f27db8eb98a1 ("ncr5380: Fix autosense bugs") Reported-and-tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c |4 ++-- drivers/scsi/atari_NCR5380.c |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:07:01.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:02.0 +1100 @@ -760,7 +760,7 @@ static struct scsi_cmnd *dequeue_next_cm struct NCR5380_cmd *ncmd; struct scsi_cmnd *cmd; - if (list_empty(&hostdata->autosense)) { + if (hostdata->sensing || list_empty(&hostdata->autosense)) { list_for_each_entry(ncmd, &hostdata->unissued, list) { cmd = NCR5380_to_scmd(ncmd); dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p target=%d busy=0x%02x lun=%llu\n", @@ -793,7 +793,7 @@ static void requeue_cmd(struct Scsi_Host struct NCR5380_hostdata *hostdata = shost_priv(instance); struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd); - if (hostdata->sensing) { + if (hostdata->sensing == cmd) { scsi_eh_restore_cmnd(cmd, &hostdata->ses); list_add(&ncmd->list, &hostdata->autosense); hostdata->sensing = NULL; Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:01.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:02.0 +1100 @@ -862,7 +862,7 @@ static struct scsi_cmnd *dequeue_next_cm struct NCR5380_cmd *ncmd; struct scsi_cmnd *cmd; - if (list_empty(&hostdata->autosense)) { + if (hostdata->sensing || list_empty(&hostdata->autosense)) { list_for_each_entry(ncmd, &hostdata->unissued, list) { cmd = NCR5380_to_scmd(ncmd); dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p target=%d busy=0x%02x lun=%llu\n", @@ -901,7 +901,7 @@ static void requeue_cmd(struct Scsi_Host struct NCR5380_hostdata *hostdata = shost_priv(instance); struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd); - if (hostdata->sensing) { + if (hostdata->sensing == cmd) { scsi_eh_restore_cmnd(cmd, &hostdata->ses); list_add(&ncmd->list, &hostdata->autosense); hostdata->sensing = NULL; -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] ncr5380: Exception handling fixes for v4.5
These patches fix some exception handling and autosense bugs that I accidentally introduced in v4.5-rc1. The error recovery and autosense code in these drivers has been unstable for a long time. Despite that, v4.5-rc1 shows a regression in as much as it exposes a bug in the aranym emulator. This leads to error recovery, which can crash. Also, Michael Schmitz reported some crashes involving abort handling for a certain target device. And Dan Carpenter found a NULL pointer deref in the new bus reset code. Error recovery and autosense are stable with these patches. I tested them using a Domex 3191D PCI card. Errors during IO were simulated by sending bus resets and unplugging/replugging the SCSI cables. Some of these patches fix bugs that only affect more capable hardware (like Atari). Thanks to Michael Schmitz for patiently testing those. Please review this series for v4.5. --- drivers/scsi/NCR5380.c | 133 +++ drivers/scsi/atari_NCR5380.c | 133 +++ 2 files changed, 118 insertions(+), 148 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling
Add missing checks for EH abort during arbitration and selection. Rework the handling of NCR5380_select() result to improve clarity. Fixes: 707d62b37fbb ("ncr5380: Fix EH during arbitration and selection") Tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 16 +++- drivers/scsi/atari_NCR5380.c | 16 +++- 2 files changed, 22 insertions(+), 10 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:07:00.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:01.0 +1100 @@ -815,15 +815,17 @@ static void NCR5380_main(struct work_str struct NCR5380_hostdata *hostdata = container_of(work, struct NCR5380_hostdata, main_task); struct Scsi_Host *instance = hostdata->host; - struct scsi_cmnd *cmd; int done; do { done = 1; spin_lock_irq(&hostdata->lock); - while (!hostdata->connected && - (cmd = dequeue_next_cmd(instance))) { + while (!hostdata->connected && !hostdata->selecting) { + struct scsi_cmnd *cmd = dequeue_next_cmd(instance); + + if (!cmd) + break; dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", cmd); @@ -840,8 +842,7 @@ static void NCR5380_main(struct work_str * entire unit. */ - cmd = NCR5380_select(instance, cmd); - if (!cmd) { + if (!NCR5380_select(instance, cmd)) { dsprintk(NDEBUG_MAIN, instance, "main: select complete\n"); } else { dsprintk(NDEBUG_MAIN | NDEBUG_QUEUES, instance, @@ -1056,6 +1057,11 @@ static struct scsi_cmnd *NCR5380_select( /* Reselection interrupt */ goto out; } + if (!hostdata->selecting) { + /* Command was aborted */ + NCR5380_write(MODE_REG, MR_BASE); + goto out; + } if (err < 0) { NCR5380_write(MODE_REG, MR_BASE); shost_printk(KERN_ERR, instance, Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:00.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:01.0 +1100 @@ -923,7 +923,6 @@ static void NCR5380_main(struct work_str struct NCR5380_hostdata *hostdata = container_of(work, struct NCR5380_hostdata, main_task); struct Scsi_Host *instance = hostdata->host; - struct scsi_cmnd *cmd; int done; /* @@ -936,8 +935,11 @@ static void NCR5380_main(struct work_str done = 1; spin_lock_irq(&hostdata->lock); - while (!hostdata->connected && - (cmd = dequeue_next_cmd(instance))) { + while (!hostdata->connected && !hostdata->selecting) { + struct scsi_cmnd *cmd = dequeue_next_cmd(instance); + + if (!cmd) + break; dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", cmd); @@ -960,8 +962,7 @@ static void NCR5380_main(struct work_str #ifdef SUPPORT_TAGS cmd_get_tag(cmd, cmd->cmnd[0] != REQUEST_SENSE); #endif - cmd = NCR5380_select(instance, cmd); - if (!cmd) { + if (!NCR5380_select(instance, cmd)) { dsprintk(NDEBUG_MAIN, instance, "main: select complete\n"); maybe_release_dma_irq(instance); } else { @@ -1257,6 +1258,11 @@ static struct scsi_cmnd *NCR5380_select( /* Reselection interrupt */ goto out; } + if (!hostdata->selecting) { + /* Command was aborted */ + NCR5380_write(MODE_REG, MR_BASE); + goto out; + } if (err < 0) { NCR5380_write(MODE_REG, MR_BASE); shost_printk(KERN_ERR, instance, -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset
Commands subject to exception handling are to be returned to the scsi mid-layer. Make sure that the various command pointers and command lists in the low-level driver are correctly cleansed of affected commands. This fixes some bugs that I accidentally introduced in v4.5-rc1 including the removal of INIT_LIST_HEAD for the 'autosense' and 'disconnected' command lists, and the possible NULL pointer dereference in NCR5380_bus_reset() that was reported by Dan Carpenter. hostdata->sensing may also point to an affected command so this pointer also has to be cleared. The abort handler calls complete_cmd() to take care of this; let's have the bus reset handler do the same. The issue queue may also contain an affected command. If so, remove it. This also follows the abort handler logic. Reported-by: Dan Carpenter Fixes: 62717f537e1b ("ncr5380: Implement new eh_bus_reset_handler") Tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 19 --- drivers/scsi/atari_NCR5380.c | 19 --- 2 files changed, 24 insertions(+), 14 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:56.0 +1100 @@ -2450,7 +2450,16 @@ static int NCR5380_bus_reset(struct scsi * commands! */ - hostdata->selecting = NULL; + if (list_del_cmd(&hostdata->unissued, cmd)) { + cmd->result = DID_RESET << 16; + cmd->scsi_done(cmd); + } + + if (hostdata->selecting) { + hostdata->selecting->result = DID_RESET << 16; + complete_cmd(instance, hostdata->selecting); + hostdata->selecting = NULL; + } list_for_each_entry(ncmd, &hostdata->disconnected, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2458,6 +2467,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->disconnected); list_for_each_entry(ncmd, &hostdata->autosense, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2465,6 +2475,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->autosense); if (hostdata->connected) { set_host_byte(hostdata->connected, DID_RESET); @@ -2472,12 +2483,6 @@ static int NCR5380_bus_reset(struct scsi hostdata->connected = NULL; } - if (hostdata->sensing) { - set_host_byte(hostdata->connected, DID_RESET); - complete_cmd(instance, hostdata->sensing); - hostdata->sensing = NULL; - } - for (i = 0; i < 8; ++i) hostdata->busy[i] = 0; #ifdef REAL_DMA Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 +1100 @@ -2646,7 +2646,16 @@ static int NCR5380_bus_reset(struct scsi * commands! */ - hostdata->selecting = NULL; + if (list_del_cmd(&hostdata->unissued, cmd)) { + cmd->result = DID_RESET << 16; + cmd->scsi_done(cmd); + } + + if (hostdata->selecting) { + hostdata->selecting->result = DID_RESET << 16; + complete_cmd(instance, hostdata->selecting); + hostdata->selecting = NULL; + } list_for_each_entry(ncmd, &hostdata->disconnected, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2654,6 +2663,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->disconnected); list_for_each_entry(ncmd, &hostdata->autosense, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2661,6 +2671,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->autosense); if (hostdata->connected) { set_host_byte(hostdata->connected, DID_RESET); @@ -2668,12 +2679,6 @@ static int NCR5380_bus_reset(struct scsi hostdata->connected = NULL; } - if (hostdata->sensing) { - set_host_byte(hostdata->connected, DID_RESET); - complete_cmd(instance, hostdata->sensing); - hostdata->sensing = NULL; - } - #ifdef SUPPORT_TAGS free_all_tags(hostdata); #endif -- To unsu
[PATCH 3/6] ncr5380: Dont re-enter NCR5380_select()
Calling NCR5380_select() from the abort handler causes various problems. Firstly, it means potentially re-entering NCR5380_select(). Secondly, it means that the lock is released, which permits the EH handlers to be re-entered. The combination results in crashes. Don't do it. Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler") Reported-and-tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 16 drivers/scsi/atari_NCR5380.c | 16 2 files changed, 16 insertions(+), 16 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:57.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:58.0 +1100 @@ -2302,6 +2302,9 @@ static bool list_del_cmd(struct list_hea * If cmd was not found at all then presumably it has already been completed, * in which case return SUCCESS to try to avoid further EH measures. * If the command has not completed yet, we must not fail to find it. + * + * The lock protects driver data structures, but EH handlers also use it + * to serialize their own execution and prevent their own re-entry. */ static int NCR5380_abort(struct scsi_cmnd *cmd) @@ -2338,14 +2341,11 @@ static int NCR5380_abort(struct scsi_cmn if (list_del_cmd(&hostdata->disconnected, cmd)) { dsprintk(NDEBUG_ABORT, instance, "abort: removed %p from disconnected list\n", cmd); - cmd->result = DID_ERROR << 16; - if (!hostdata->connected) - NCR5380_select(instance, cmd); - if (hostdata->connected != cmd) { - complete_cmd(instance, cmd); - result = FAILED; - goto out; - } + /* Can't call NCR5380_select() and send ABORT because that +* means releasing the lock. Need a bus reset. +*/ + result = FAILED; + goto out; } if (hostdata->connected == cmd) { Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:57.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:58.0 +1100 @@ -2497,6 +2497,9 @@ static bool list_del_cmd(struct list_hea * If cmd was not found at all then presumably it has already been completed, * in which case return SUCCESS to try to avoid further EH measures. * If the command has not completed yet, we must not fail to find it. + * + * The lock protects driver data structures, but EH handlers also use it + * to serialize their own execution and prevent their own re-entry. */ static int NCR5380_abort(struct scsi_cmnd *cmd) @@ -2533,14 +2536,11 @@ static int NCR5380_abort(struct scsi_cmn if (list_del_cmd(&hostdata->disconnected, cmd)) { dsprintk(NDEBUG_ABORT, instance, "abort: removed %p from disconnected list\n", cmd); - cmd->result = DID_ERROR << 16; - if (!hostdata->connected) - NCR5380_select(instance, cmd); - if (hostdata->connected != cmd) { - complete_cmd(instance, cmd); - result = FAILED; - goto out; - } + /* Can't call NCR5380_select() and send ABORT because that +* means releasing the lock. Need a bus reset. +*/ + result = FAILED; + goto out; } if (hostdata->connected == cmd) { -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] ncr5380: Forget aborted commands
The list structures and related logic used in the NCR5380 driver mean that a command cannot be queued twice (i.e. can't appear on more than one queue and can't appear on the same queue more than once). The abort handler must forget the command so that the mid-layer can re-use it. E.g. the ML may send it back to the LLD via via scsi_eh_get_sense(). Fix this and also fix two error paths, so that commands get forgotten iff completed. Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler") Tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 62 +++ drivers/scsi/atari_NCR5380.c | 62 +++ 2 files changed, 34 insertions(+), 90 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:58.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:00.0 +1100 @@ -1796,6 +1796,7 @@ static void NCR5380_information_transfer do_abort(instance); cmd->result = DID_ERROR << 16; complete_cmd(instance, cmd); + hostdata->connected = NULL; return; #endif case PHASE_DATAIN: @@ -1845,7 +1846,6 @@ static void NCR5380_information_transfer sink = 1; do_abort(instance); cmd->result = DID_ERROR << 16; - complete_cmd(instance, cmd); /* XXX - need to source or sink data here, as appropriate */ } else cmd->SCp.this_residual -= transfersize - len; @@ -2294,14 +2294,14 @@ static bool list_del_cmd(struct list_hea * [disconnected -> connected ->]... * [autosense -> connected ->] done * - * If cmd is unissued then just remove it. - * If cmd is disconnected, try to select the target. - * If cmd is connected, try to send an abort message. - * If cmd is waiting for autosense, give it a chance to complete but check - * that it isn't left connected. * If cmd was not found at all then presumably it has already been completed, * in which case return SUCCESS to try to avoid further EH measures. + * * If the command has not completed yet, we must not fail to find it. + * We have no option but to forget the aborted command (even if it still + * lacks sense data). The mid-layer may re-issue a command that is in error + * recovery (see scsi_send_eh_cmnd), but the logic and data structures in + * this driver are such that a command can appear on one queue only. * * The lock protects driver data structures, but EH handlers also use it * to serialize their own execution and prevent their own re-entry. @@ -2327,6 +2327,7 @@ static int NCR5380_abort(struct scsi_cmn "abort: removed %p from issue queue\n", cmd); cmd->result = DID_ABORT << 16; cmd->scsi_done(cmd); /* No tag or busy flag to worry about */ + goto out; } if (hostdata->selecting == cmd) { @@ -2344,6 +2345,8 @@ static int NCR5380_abort(struct scsi_cmn /* Can't call NCR5380_select() and send ABORT because that * means releasing the lock. Need a bus reset. */ + set_host_byte(cmd, DID_ERROR); + complete_cmd(instance, cmd); result = FAILED; goto out; } @@ -2351,45 +2354,9 @@ static int NCR5380_abort(struct scsi_cmn if (hostdata->connected == cmd) { dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is connected\n", cmd); hostdata->connected = NULL; - if (do_abort(instance)) { - set_host_byte(cmd, DID_ERROR); - complete_cmd(instance, cmd); - result = FAILED; - goto out; - } - set_host_byte(cmd, DID_ABORT); #ifdef REAL_DMA hostdata->dma_len = 0; #endif - if (cmd->cmnd[0] == REQUEST_SENSE) - complete_cmd(instance, cmd); - else { - struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd); - - /* Perform autosense for this command */ - list_add(&ncmd->list, &hostdata->autosense); - } - } - - if (list_find_cmd(&hostdata->autosense, cmd)) { - dsprintk(NDEBUG_ABORT, instance, -"abort: found %p on sense queue\n", cmd); - spin_unlock_irqrestore(&hostdata->lock,
[PATCH 2/6] ncr5380: Dont release lock for PIO transfer
The calls to NCR5380_transfer_pio() for DATA IN and DATA OUT phases will modify cmd->SCp.this_residual, cmd->SCp.ptr and cmd->SCp.buffer. That works as long as EH does not intervene, which became possible in atari_NCR5380.c when I changed the locking to bring it closer to NCR5380.c. If error recovery aborts the command, the scsi_cmnd in question and its buffer will be returned to the mid-layer. So the transfer has to cease, but it can't be stopped by the initiator because the target controls the bus phase. The problem does not arise if the lock is not released. That was fine for atari_scsi, because it implements DMA. For the other drivers, we have to release the lock and re-enable interrupts for long PIO data transfers. The solution is to split the transfer into small chunks. In between chunks the main loop releases the lock and re-enables interrupts. Thus interrupts can be serviced and eh_bus_reset_handler can intervene if need be. This fixes an oops in NCR5380_transfer_pio() that can happen when the EH abort handler is invoked during DATA IN or DATA OUT phase. Fixes: 11d2f63b9cf5 ("ncr5380: Change instance->host_lock to hostdata->lock") Reported-and-tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 16 +--- drivers/scsi/atari_NCR5380.c | 16 +--- 2 files changed, 18 insertions(+), 14 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:57.0 +1100 @@ -1759,9 +1759,7 @@ static void NCR5380_information_transfer unsigned char msgout = NOP; int sink = 0; int len; -#if defined(PSEUDO_DMA) || defined(REAL_DMA_POLL) int transfersize; -#endif unsigned char *data; unsigned char phase, tmp, extended_msg[10], old_phase = 0xff; struct scsi_cmnd *cmd; @@ -1854,13 +1852,17 @@ static void NCR5380_information_transfer } else #endif /* defined(PSEUDO_DMA) || defined(REAL_DMA_POLL) */ { - spin_unlock_irq(&hostdata->lock); - NCR5380_transfer_pio(instance, &phase, -(int *)&cmd->SCp.this_residual, + /* Break up transfer into 3 ms chunks, +* presuming 6 accesses per handshake. +*/ + transfersize = min((unsigned long)cmd->SCp.this_residual, + hostdata->accesses_per_ms / 2); + len = transfersize; + NCR5380_transfer_pio(instance, &phase, &len, (unsigned char **)&cmd->SCp.ptr); - spin_lock_irq(&hostdata->lock); + cmd->SCp.this_residual -= transfersize - len; } - break; + return; case PHASE_MSGIN: len = 1; data = &tmp; Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:57.0 +1100 @@ -1838,9 +1838,7 @@ static void NCR5380_information_transfer unsigned char msgout = NOP; int sink = 0; int len; -#if defined(REAL_DMA) int transfersize; -#endif unsigned char *data; unsigned char phase, tmp, extended_msg[10], old_phase = 0xff; struct scsi_cmnd *cmd; @@ -1983,18 +1981,22 @@ static void NCR5380_information_transfer } else #endif /* defined(REAL_DMA) */ { - spin_unlock_irq(&hostdata->lock); - NCR5380_transfer_pio(instance, &phase, -(int *)&cmd->SCp.this_residual, + /* Break up transfer into 3 ms chunks, +* presuming 6 accesses per handshake. +*/ + transfersize = min((unsigned long)cmd->SCp.this_residual, + hostdata->accesses_per_ms / 2); + len = transfersize; +
Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
Hello, >> > As this is Linux 4.3 and not 4.4, I guess this is a different problem >> > though. Alexandre, where you able to capture the stack trace? I’d submit >> > a new bug report with this. >> >> Here is a photo. Please ping me if you need to test some debugging patches. > > It looks like the problem occurs in blk_post_runtime_resume(). Since > there have been recent changes to this routine, it's hard to tell > whether you're using the most up-to-date code. > > In particular, the first few lines of blk_post_runtime_resume() in > block/blk-core.c should look like this: > > void blk_post_runtime_resume(struct request_queue *q, int err) > { > if (!q->dev) > return; > > The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer > dereference in runtime PM"), which was added to the mainline kernel > between 4.3 and 4.4. I don't know what the commit ID would be for a > .stable kernel. Okay now I've tried with 4.4. The oops does not occur. So this is fixed for me in 4.4. If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix crashes in sd and sr runtime PM") is not enough to backport. Something in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM") is also needed. Thanks a lot, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Announce] sg3_utils-1.42 available
On 02/18/2016 11:52 AM, Douglas Gilbert wrote: On 16-02-17 11:59 PM, Douglas Gilbert wrote: sg3_utils is a package of command line utilities for sending SCSI and some ATA commands to devices. This package targets the Linux 4, 3, 2.6 and 2.4 kernel series. It has ports to FreeBSD, Tru64, Solaris, and Windows (cygwin and MinGW). There are two new utilities (sg_read_attr and sg_timestamp) and additions to many others, see the ChangeLog below. This version tracks various changes made by www.t10.org since May 2015 until January 2016. Missed the links: For an overview of sg3_utils and downloads see this page: http://sg.danny.cz/sg/sg3_utils.html The sg_ses utility (for enclosure devices) is discussed at: http://sg.danny.cz/sg/sg_ses.html A full changelog can be found at: http://sg.danny.cz/sg/p/sg3_utils.ChangeLog Hi Doug, Thanks for all the work you have done for maintaining sg3_utils and also for having prepared a new release. I have already downloaded version v1.42 and started using that version. The detailed changelog is helpful. However, I think for sg3_utils contributors it would be convenient to have access to the sg3_utils source code repository such that we can see all the patches that went in. Is such a repository publicly available, and if not, do you have any plans to make such a repository available ? Since I have a few patches ready that I would like to contribute to the sg3_utils package, is there a mailing list that I should CC when sending these patches to you ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]
On 02/23/16 00:09, Mike Snitzer wrote: > I should note that I applied this patch for 4.6: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6 > > (but it was purely a fallout of code-review, and looking at the nvme's > use of blk_mq_requeue_request, I did't consider it to be a critical fix > or anything) The patch above contains following change: > +static void dm_mq_requeue_request(struct request *rq) > +{ > + struct request_queue *q = rq->q; > + unsigned long flags; > + > + blk_mq_requeue_request(rq); > + spin_lock_irqsave(q->queue_lock, flags); > + if (!blk_queue_stopped(q)) > + blk_mq_kick_requeue_list(q); > + spin_unlock_irqrestore(q->queue_lock, flags); > +} If you make it conditional to call blk_mq_kick_requeue_list() here, I think we have to call the function from start_queue(), too, otherwise requeued requests might stay forever in q->requeue_list. -- Jun'ichi Nomura, NEC Corporation -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages
Hello, I would like to attend LSF/MM 2016 to discuss the following topics. 1) Online Logical Head Depop Some disk drives available on the market already provide a "logical depop" function which allows a system to decommission a defective disk head, reformat the disk and continue using this same disk with a reduced capacity. Such feature can allow reduced operation costs (delayed HDD replacement) but has the drawback of a data loss (data under the remaining valid heads) and disk downtime during re-formating. Online logical head depop is a proposed new feature allowing retaining the disk valid data and eliminating the need for a disk re-format. The basic idea is to introduce new commands for the host to discover the ranges of LBAs impacted by a defective head. Using this information, the host can take actions when a disk head failure event is suspected or reported: (a) The impacted LBAs can be depopulated, resulting in the disk operating as a “thin provisioned” device. (b) The impacted LBAs can be amputated, resulting in error for all subsequent accesses to the LBAs under the defective head. (c) Optionally, a host may decide to reformat (compact) the disk to restore operation as a fully-provisioned device with a lower capacity. The goal of the discussion would be to gather the opinion of the developers for drafting a command standard minimizing the impact of this feature on the block I/O stack as well as allowing a simple use of this feature by file systems and device mapper drivers (including logical volume manager). 2) Write back of dirty pages to SMR block devices: Dirty pages of a block device inode are currently processed using the generic_writepages function, which can be executed simultaneously by multiple contexts (e.g sync, fsync, msync, sync_file_range, etc). Mutual exclusion of the dirty page processing being achieved only at the page level (page lock & page writeback flag), multiple processes executing a "sync" of overlapping block ranges over the same zone of an SMR disk can cause an out-of-LBA-order sequence of write requests being sent to the underlying device. On a host managed SMR disk, where sequential write to disk zones is mandatory, this result in errors and the impossibility for an application using raw sequential disk write accesses to be guaranteed successful completion of its write or fsync requests. Using the zone information attached to the SMR block device queue (introduced by Hannes), calls to the generic_writepages function can be made mutually exclusive on a per zone basis by locking the zones. This guarantees sequential request generation for each zone and avoid write errors without any modification to the generic code implementing generic_writepages. This is but one possible solution for supporting SMR host-managed devices without any major rewrite of page cache management and write-back processing. The opinion of the audience regarding this solution and discussing other potential solutions would be greatly appreciated. Thank you. Best regards. Damien Le Moal, Ph.D. Sr. Manager, System Software Group, HGST Research, HGST, a Western Digital company damien.lem...@hgst.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.hgst.com Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
Re: why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]
On Mon, Feb 22 2016 at 8:34pm -0500, Junichi Nomura wrote: > On 02/23/16 00:09, Mike Snitzer wrote: > > I should note that I applied this patch for 4.6: > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6 > > > > (but it was purely a fallout of code-review, and looking at the nvme's > > use of blk_mq_requeue_request, I did't consider it to be a critical fix > > or anything) > > The patch above contains following change: > > > +static void dm_mq_requeue_request(struct request *rq) > > +{ > > + struct request_queue *q = rq->q; > > + unsigned long flags; > > + > > + blk_mq_requeue_request(rq); > > + spin_lock_irqsave(q->queue_lock, flags); > > + if (!blk_queue_stopped(q)) > > + blk_mq_kick_requeue_list(q); > > + spin_unlock_irqrestore(q->queue_lock, flags); > > +} > > If you make it conditional to call blk_mq_kick_requeue_list() here, > I think we have to call the function from start_queue(), too, > otherwise requeued requests might stay forever in q->requeue_list. Yes, you're right. Fixed up and pushed to rebased linux-dm.git 'dm-4.6' branch: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=818c5f3bef750eb5998b468f84391e4d656b97ed -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages
On 02/22/16 18:56, Damien Le Moal wrote: 2) Write back of dirty pages to SMR block devices: Dirty pages of a block device inode are currently processed using the generic_writepages function, which can be executed simultaneously by multiple contexts (e.g sync, fsync, msync, sync_file_range, etc). Mutual exclusion of the dirty page processing being achieved only at the page level (page lock & page writeback flag), multiple processes executing a "sync" of overlapping block ranges over the same zone of an SMR disk can cause an out-of-LBA-order sequence of write requests being sent to the underlying device. On a host managed SMR disk, where sequential write to disk zones is mandatory, this result in errors and the impossibility for an application using raw sequential disk write accesses to be guaranteed successful completion of its write or fsync requests. Using the zone information attached to the SMR block device queue (introduced by Hannes), calls to the generic_writepages function can be made mutually exclusive on a per zone basis by locking the zones. This guarantees sequential request generation for each zone and avoid write errors without any modification to the generic code implementing generic_writepages. This is but one possible solution for supporting SMR host-managed devices without any major rewrite of page cache management and write-back processing. The opinion of the audience regarding this solution and discussing other potential solutions would be greatly appreciated. Hello Damien, Is it sufficient to support filesystems like BTRFS on top of SMR drives or would you also like to see that filesystems like ext4 can use SMR drives ? In the latter case: the behavior of SMR drives differs so significantly from that of other block devices that I'm not sure that we should try to support these directly from infrastructure like the page cache. If we look e.g. at NAND SSDs then we see that the characteristics of NAND do not match what filesystems expect (e.g. large erase blocks). That is why every SSD vendor provides an FTL (Flash Translation Layer), either inside the SSD or as a separate software driver. An FTL implements a so-called LFS (log-structured filesystem). With what I know about SMR this technology looks also suitable for implementation of a LFS. Has it already been considered to implement an LFS driver for SMR drives ? That would make it possible for any filesystem to access an SMR drive as any other block device. I'm not sure of this but maybe it will be possible to share some infrastructure with the LightNVM driver (directory drivers/lightnvm in the Linux kernel tree). This driver namely implements an FTL. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages
>On 02/22/16 18:56, Damien Le Moal wrote: >> 2) Write back of dirty pages to SMR block devices: >> >> Dirty pages of a block device inode are currently processed using the >> generic_writepages function, which can be executed simultaneously >> by multiple contexts (e.g sync, fsync, msync, sync_file_range, etc). >> Mutual exclusion of the dirty page processing being achieved only at >> the page level (page lock & page writeback flag), multiple processes >> executing a "sync" of overlapping block ranges over the same zone of >> an SMR disk can cause an out-of-LBA-order sequence of write requests >> being sent to the underlying device. On a host managed SMR disk, where >> sequential write to disk zones is mandatory, this result in errors and >> the impossibility for an application using raw sequential disk write >> accesses to be guaranteed successful completion of its write or fsync >> requests. >> >> Using the zone information attached to the SMR block device queue >> (introduced by Hannes), calls to the generic_writepages function can >> be made mutually exclusive on a per zone basis by locking the zones. >> This guarantees sequential request generation for each zone and avoid >> write errors without any modification to the generic code implementing >> generic_writepages. >> >> This is but one possible solution for supporting SMR host-managed >> devices without any major rewrite of page cache management and >> write-back processing. The opinion of the audience regarding this >> solution and discussing other potential solutions would be greatly >> appreciated. > >Hello Damien, > >Is it sufficient to support filesystems like BTRFS on top of SMR drives >or would you also like to see that filesystems like ext4 can use SMR >drives ? In the latter case: the behavior of SMR drives differs so >significantly from that of other block devices that I'm not sure that we >should try to support these directly from infrastructure like the page >cache. If we look e.g. at NAND SSDs then we see that the characteristics >of NAND do not match what filesystems expect (e.g. large erase blocks). >That is why every SSD vendor provides an FTL (Flash Translation Layer), >either inside the SSD or as a separate software driver. An FTL >implements a so-called LFS (log-structured filesystem). With what I know >about SMR this technology looks also suitable for implementation of a >LFS. Has it already been considered to implement an LFS driver for SMR >drives ? That would make it possible for any filesystem to access an SMR >drive as any other block device. I'm not sure of this but maybe it will >be possible to share some infrastructure with the LightNVM driver >(directory drivers/lightnvm in the Linux kernel tree). This driver >namely implements an FTL. Hello Bart, Thank you for your comments. I totally agree with you that trying to support SMR disks by only modifying the page cache so that unmodified standard file systems like BTRFS or ext4 remain operational is not realistic at best, and more likely simply impossible. For this kind of use case, as you said, an FTL or a device mapper driver are much more suitable. The case I am considering for this discussion is for raw block device accesses by an application (writes from user space to /dev/sdxx). This is a very likely use case scenario for high capacity SMR disks with applications like distributed object stores / key value stores. In this case, write-back of dirty pages in the block device file inode mapping is handled in fs/block_dev.c using the generic helper function generic_writepages. This does not guarantee the generation of the required sequential write pattern per zone necessary for host-managed disks. As I explained, aligning calls of this function to zone boundaries while locking the zones under write-back solves simply the problem (implemented and tested). This is of course only one possible solution. Pushing modifications deeper in the code or providing a "generic_sequential_writepages" helper function are other potential solutions that in my opinion are worth discussing as other types of devices may benefit also in terms of performance (e.g. regular disk drives prefer sequential writes, and SSDs as well) and/or lighten the overhead on an underlying FTL or device mapper driver. For a file system, an SMR compliant implementation of a file inode mapping writepages method should be provided by the file system itself as the sequentiality of the write pattern depends further on the block allocation mechanism of the file system. Note that the goal here is not to hide to applications the sequential write constraint of SMR disks. The page cache itself (the mapping of the block device inode) remains unchanged. But the modification proposed guarantees that a well behaved application writing sequentially to zones through the page cache will see successful sync operations. Best regards. Damien Le Moal, Ph.D. Sr. Manager, System S