date:20160222

Re: dm-multipath test scripts

2016-02-22 Thread Junichi Nomura

On 02/20/16 15:12, Mike Snitzer wrote:
> On Fri, Feb 19 2016 at  2:42pm -0500, Mike Snitzer  wrote:
>> Have you been running with blk-mq?
>> Either by setting CONFIG_DM_MQ_DEFAULT or:
>> echo Y > /sys/module/dm_mod/parameters/use_blk_mq
>>
>> I'm seeing test_02_sdev_delete fail with blk-mq enabled.
> 
> I only see failure if I stack dm-mq ontop of old non-mq scsi devices with:
> 
> echo N > /sys/module/scsi_mod/parameters/use_blk_mq
> echo Y > /sys/module/dm_mod/parameters/use_blk_mq

Ah, I didn't test that combination. I can see the failure, too.

> But this makes me think the novelty of having dm-mq support stacking on
> non-blk-mq devices was misplaced.  It is a senseless config.  I'll
> probably remove support for such stacking soon (next week). 

Looking at the failure, I suspect it could be a common issue of dm-mq
regardless of underlying device type.

When requeueing, following calls happen in dm-mq:
  dm_requeue_original_request() {
..
blk_mq_requeue_request(rq);
blk_mq_kick_requeue_list(rq->q);

then from block workqueue:
  blk_mq_requeue_work() {
..
blk_mq_start_hw_queue(q);

and blk_mq_start_hw_queue() re-starts the queue even if DM has
stopped it for suspending. As a result, dm-mq ends up repeating
submit-error-requeue forever and suspend never completes. Or,
suspend somehow proceeds to clear DMF_NOFLUSH_SUSPENDING and
I/O error may directly be returned to submitter.

Attached patch fixes the problem for DM. But given the code comment,
there should be call sites which depend on 'start-if-stopped' behavior
of blk_mq_requeue_work and we may need other solution.

-- 
Jun'ichi Nomura, NEC Corporation

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 56c0a72..bbfe936 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -481,11 +481,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
blk_mq_insert_request(rq, false, false, false);
}

-   /*
-* Use the start variant of queue running here, so that running
-* the requeue work will kick stopped queues.
-*/
-   blk_mq_start_hw_queues(q);
+   blk_mq_run_hw_queues(q, false);
 }

 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()

2016-02-22 Thread John Garry




I would like to make another point about why I am making this change
in case it is not clear. The queue full events are form
TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in
the slot: I want the slot retried when this occurs, so I set status
as SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI
midlayer so we get a retry. I could use SAS_OPEN_REJECT
alternatively as the error which would have the same affect.
The queue full are not from all slots being consumed in the HBA.


Ah, right. So you might be getting those errors even with some free
slots on the HBA. As such they are roughly equivalent to a
QUEUE_FULL SCSI statue, right?
So after reading SPL I guess you are right here; using tags wouldn't
help for this situation.



Yes, I think we have 90% of slots free in the host when this occurs for 
one particular test - Our v2 hw has 2K slots, which is >> cmd_per_lun. 
The errors are equivalent to queue full for the device.


Thanks,
John


Cheers,

Hannes




--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 111441] iscsi fails to attach to targets

2016-02-22 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=111441

--- Comment #18 from Serguei Bezverkhi  ---
Hello Hannes,

Thank you for your reply. I am on 4.4.2 kernel, is there any chance to commit
it in 4.4 as well? If not, could you send me diff for 4.4 kernel.

Best regards

Serguei


Serguei Bezverkhi,
TECHNICAL LEADER.SERVICES
Global SP Services
sbezv...@cisco.com
Phone: +1 416 306 7312
Mobile: +1 514 234 7374

CCIE (R&S,SP,Sec) - #9527

Cisco.com



 Think before you print.
This email may contain confidential and privileged material for the sole use of
the intended recipient. Any review, use, distribution or disclosure by others
is strictly prohibited. If you are not the intended recipient (or authorized to
receive for the recipient), please contact the sender by reply email and delete
all copies of this message.
Please click here for Company Registration Information.




-Original Message-
From: Hannes Reinecke [mailto:h...@suse.de] 
Sent: Monday, February 22, 2016 2:08 AM
To: Serguei Bezverkhi (sbezverk) ; Mike Christie

Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; Christoph
Hellwig 
Subject: Re: [Bug 111441] New: iscsi fails to attach to targets

On 02/22/2016 01:45 AM, Serguei Bezverkhi (sbezverk) wrote:
> Hi Mike,
> 
> I just wanted to follow up with you to see if the patch got committed to an 
> upstream kernel if yes, please let me into which version it went.
> 
> Thank you
> 
> Serguei
> 
> 
> Serguei Bezverkhi,
> TECHNICAL LEADER.SERVICES
> Global SP Services
> sbezv...@cisco.com
> Phone: +1 416 306 7312
> Mobile: +1 514 234 7374
> 
> CCIE (R&S,SP,Sec) - #9527
> 
> Cisco.com
> 
> 
> 
>  Think before you print.
> This email may contain confidential and privileged material for the sole use 
> of the intended recipient. Any review, use, distribution or disclosure by 
> others is strictly prohibited. If you are not the intended recipient (or 
> authorized to receive for the recipient), please contact the sender by reply 
> email and delete all copies of this message.
> Please click here for Company Registration Information.
> 
> 
> 
> -Original Message-
> From: Mike Christie [mailto:micha...@cs.wisc.edu]
> Sent: Friday, January 29, 2016 6:33 PM
> To: Serguei Bezverkhi (sbezverk) 
> Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; 
> Christoph Hellwig ; Hannes Reinecke 
> Subject: Re: [Bug 111441] New: iscsi fails to attach to targets
> 
> On 01/29/2016 04:21 PM, Serguei Bezverkhi (sbezverk) wrote:
>> HI Mike,
>>
>> I tried your patch and it is has eliminated first traceback but I still do 
>> not see my remote targets.
>>
> 
> That is sort of expected. Your target is not setup for ALUA properly. It says 
> it supports ALUA, but when scsi_dh_alua asks about the ports it is reporting 
> there are none. Ccing the people that made the patch that added the issue and 
> own the code.
> 
> Hey Christoph and Hannes,
> 
> The dh/alua changes that added this:
> 
> error = scsi_dh_add_device(sdev);
> if (error) {
> sdev_printk(KERN_INFO, sdev,
> "failed to add device handler: %d\n", error);
> return error;
> }
> 
> to scsi_sysfs_add_sdev are adding a regression.
> 
> 1. If that fails, then we forget to do device_del before doing the return. My 
> patch in this thread added that back, so we do not see the sysfs oopses 
> anymore. But.
> 
> 2. It looks like in older kernels, we would allow misconfigured targets like 
> this one to still setup devices. Do we want that old behavior back?
> Should we just ignore the return value from scsi_dh_add_device above?
> Note that in this case, it is LIO so it can be easily fixed on the target 
> side by just setting it up properly. I do not think other targets would hit 
> this type of issue.
> 
> 
This has been fixed up with my patchset to update the ALUA handler, most
notably the commit 'scsi: ignore errors from scsi_dh_add_device()' which was
included in 4.5.

Cheers,

Hannes

-- 
You are receiving this mail because:
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [Bug 111441] New: iscsi fails to attach to targets

2016-02-22 Thread Serguei Bezverkhi (sbezverk)

Hello Hannes,

Thank you for your reply. I am on 4.4.2 kernel, is there any chance to commit 
it in 4.4 as well? If not, could you send me diff for 4.4 kernel.

Best regards

Serguei


Serguei Bezverkhi,
TECHNICAL LEADER.SERVICES
Global SP Services
sbezv...@cisco.com
Phone: +1 416 306 7312
Mobile: +1 514 234 7374

CCIE (R&S,SP,Sec) - #9527

Cisco.com



 Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
Please click here for Company Registration Information.




-Original Message-
From: Hannes Reinecke [mailto:h...@suse.de] 
Sent: Monday, February 22, 2016 2:08 AM
To: Serguei Bezverkhi (sbezverk) ; Mike Christie 

Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; Christoph 
Hellwig 
Subject: Re: [Bug 111441] New: iscsi fails to attach to targets

On 02/22/2016 01:45 AM, Serguei Bezverkhi (sbezverk) wrote:
> Hi Mike,
> 
> I just wanted to follow up with you to see if the patch got committed to an 
> upstream kernel if yes, please let me into which version it went.
> 
> Thank you
> 
> Serguei
> 
> 
> Serguei Bezverkhi,
> TECHNICAL LEADER.SERVICES
> Global SP Services
> sbezv...@cisco.com
> Phone: +1 416 306 7312
> Mobile: +1 514 234 7374
> 
> CCIE (R&S,SP,Sec) - #9527
> 
> Cisco.com
> 
> 
> 
>  Think before you print.
> This email may contain confidential and privileged material for the sole use 
> of the intended recipient. Any review, use, distribution or disclosure by 
> others is strictly prohibited. If you are not the intended recipient (or 
> authorized to receive for the recipient), please contact the sender by reply 
> email and delete all copies of this message.
> Please click here for Company Registration Information.
> 
> 
> 
> -Original Message-
> From: Mike Christie [mailto:micha...@cs.wisc.edu]
> Sent: Friday, January 29, 2016 6:33 PM
> To: Serguei Bezverkhi (sbezverk) 
> Cc: bugzilla-dae...@bugzilla.kernel.org; linux-scsi@vger.kernel.org; 
> Christoph Hellwig ; Hannes Reinecke 
> Subject: Re: [Bug 111441] New: iscsi fails to attach to targets
> 
> On 01/29/2016 04:21 PM, Serguei Bezverkhi (sbezverk) wrote:
>> HI Mike,
>>
>> I tried your patch and it is has eliminated first traceback but I still do 
>> not see my remote targets.
>>
> 
> That is sort of expected. Your target is not setup for ALUA properly. It says 
> it supports ALUA, but when scsi_dh_alua asks about the ports it is reporting 
> there are none. Ccing the people that made the patch that added the issue and 
> own the code.
> 
> Hey Christoph and Hannes,
> 
> The dh/alua changes that added this:
> 
> error = scsi_dh_add_device(sdev);
> if (error) {
> sdev_printk(KERN_INFO, sdev,
> "failed to add device handler: %d\n", error);
> return error;
> }
> 
> to scsi_sysfs_add_sdev are adding a regression.
> 
> 1. If that fails, then we forget to do device_del before doing the return. My 
> patch in this thread added that back, so we do not see the sysfs oopses 
> anymore. But.
> 
> 2. It looks like in older kernels, we would allow misconfigured targets like 
> this one to still setup devices. Do we want that old behavior back?
> Should we just ignore the return value from scsi_dh_add_device above?
> Note that in this case, it is LIO so it can be easily fixed on the target 
> side by just setting it up properly. I do not think other targets would hit 
> this type of issue.
> 
> 
This has been fixed up with my patchset to update the ALUA handler, most 
notably the commit 'scsi: ignore errors from scsi_dh_add_device()' which was 
included in 4.5.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/2][RESEND] scsi_transport_fc: LUN masking

2016-02-22 Thread Seymour, Shane M

Hi Hannes,

How do you know that a request for an async scan is complete (I'm assuming that 
you get add or change udev events)? Assuming that someone has manually started 
a scan on something (e.g. some newly presented devices after boot) and all 
scans are going to be async how do you when it is complete rather than waiting 
in a work queue? An example may be a sysfs file that contains unscanned, 
pending, scanning, scanned so you know when it's complete at the appropriate 
level in sysfs (the hba and the rports) so you know when can continue if you're 
polling the status (e.g. checking as part of system admin work with newly 
presented rports so you can then do something with them).

Thanks
Shane

> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Hannes Reinecke
> Sent: Monday, February 22, 2016 6:51 PM
> To: Martin K . Petersen
> Cc: Christoph Hellwig; James Bottomley; Johannes Thumshirn; linux-
> s...@vger.kernel.org; Hannes Reinecke
> Subject: [PATCH 0/2][RESEND] scsi_transport_fc: LUN masking
> 
> Hi all,
> 
> having been subjected to the pain of trying to bootstrap a really large
> machine with systemd I decided to implement LUN masking in
> scsi_transport_fc.
> The principle is simple: disallow the automated LUN scanning when
> discovering a rport, and create udev rules which selectively enable individual
> LUNs by echoing the relevant values in the 'scan'
> attribute of the SCSI host.
> With that I'm able to boot an arbitrary large machine without running into any
> udev or systemd imposed timeout.
> To _disable_ LUN masking and restoring the original behaviour I've noticed
> that the 'scan' sysfs attribute is actually synchronous, ie the calling 
> process
> will be blocked until the entire LUN scan is completed.
> So I've added another module parameter 'async_user_scan' to move the
> scanning onto the existing scan workqueue, and unblock the calling process.
> 
> As usual, comments and reviews are welcome.
> 
> Hannes Reinecke (2):
>   scsi_transport_fc: implement 'disable_target_scan' module parameter
>   scsi_transport_fc: Implement 'async_user_scan' module parameter
> 
>  drivers/scsi/scsi_transport_fc.c | 47
> +---
>  1 file changed, 44 insertions(+), 3 deletions(-)
> 
> --
> 2.6.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 PATCH 1/3] scsi:stex.c Support to Pegasus series.

2016-02-22 Thread Charles Chiou

From: Charles 

Pegasus is a high performace hardware RAID solution designed to unleash
the raw power of Thunderbolt technology.

1. Add code to distinct SuperTrack and Pegasus series by sub device ID.
   It should support backward compatibility.

2. Change the driver version.

Signed-off-by: Charles Chiou 
Reviewed-by: Johannes Thumshirn 
---
 drivers/scsi/stex.c | 32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 2de28d7..495d632 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -1,7 +1,7 @@
 /*
  * SuperTrak EX Series Storage Controller driver for Linux
  *
- * Copyright (C) 2005-2009 Promise Technology Inc.
+ * Copyright (C) 2005-2015 Promise Technology Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -38,11 +38,11 @@
 #include 
 
 #define DRV_NAME "stex"
-#define ST_DRIVER_VERSION "4.6..4"
-#define ST_VER_MAJOR   4
-#define ST_VER_MINOR   6
-#define ST_OEM 0
-#define ST_BUILD_VER   4
+#define ST_DRIVER_VERSION  "5.00..01"
+#define ST_VER_MAJOR   5
+#define ST_VER_MINOR   00
+#define ST_OEM 
+#define ST_BUILD_VER   01
 
 enum {
/* MU register offset */
@@ -328,6 +328,7 @@ struct st_hba {
u16 rq_count;
u16 rq_size;
u16 sts_count;
+   u8  supports_pm;
 };
 
 struct st_card_info {
@@ -1560,6 +1561,25 @@ static int stex_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
 
hba->cardtype = (unsigned int) id->driver_data;
ci = &stex_card_info[hba->cardtype];
+   switch (id->subdevice) {
+   case 0x4221:
+   case 0x4222:
+   case 0x4223:
+   case 0x4224:
+   case 0x4225:
+   case 0x4226:
+   case 0x4227:
+   case 0x4261:
+   case 0x4262:
+   case 0x4263:
+   case 0x4264:
+   case 0x4265:
+   break;
+   default:
+   if (hba->cardtype == st_yel)
+   hba->supports_pm = 1;
+   }
+
sts_offset = scratch_offset = (ci->rq_count+1) * ci->rq_size;
if (hba->cardtype == st_yel)
sts_offset += (ci->sts_count+1) * sizeof(u32);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 PATCH 2/3] scsi:stex.c Add hotplug support

2016-02-22 Thread Charles Chiou

From: Charles 

1. Add hotplug support. Pegasus support surprise removal. To this end, I
   use return_abnormal_state function to return DID_NO_CONNECT for all
  commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
   device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
   MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
   MU_STATE_NOCONNECT represent that device is plugged out from the host.

4. Use return_abnormal_function() to substitute part of code in stex_do_reset.

Signed-off-by: Charles Chiou 
Reviewed-by: Johannes Thumshirn 
---
 drivers/scsi/stex.c | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 495d632..1994603 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -84,6 +84,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,
 
MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -537,6 +539,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg *req, 
u16 tag)
readl(hba->mmio_base + YH2I_REQ); /* flush */
 }
 
+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   for (tag = 0; tag < hba->host->can_queue; tag++) {
+   ccb = &hba->ccb[tag];
+   if (ccb->req == NULL)
+   continue;
+   ccb->req = NULL;
+   if (ccb->cmd) {
+   scsi_dma_unmap(ccb->cmd);
+   ccb->cmd->result = status << 16;
+   ccb->cmd->scsi_done(ccb->cmd);
+   ccb->cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+}
 static int
 stex_slave_config(struct scsi_device *sdev)
 {
@@ -560,8 +583,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void 
(*done)(struct scsi_cmnd *))
id = cmd->device->id;
lun = cmd->device->lun;
hba = (struct st_hba *) &host->hostdata[0];
-
-   if (unlikely(hba->mu_status == MU_STATE_RESETTING))
+   if (hba->mu_status == MU_STATE_NOCONNECT) {
+   cmd->result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
+   if (unlikely(hba->mu_status != MU_STATE_STARTED))
return SCSI_MLQUEUE_HOST_BUSY;
 
switch (cmd->cmnd[0]) {
@@ -1260,10 +1287,8 @@ static void stex_ss_reset(struct st_hba *hba)
 
 static int stex_do_reset(struct st_hba *hba)
 {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;
 
spin_lock_irqsave(hba->host->host_lock, flags);
if (hba->mu_status == MU_STATE_STARTING) {
@@ -1297,20 +1322,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba->cardtype == st_yel)
stex_ss_reset(hba);
 
-   spin_lock_irqsave(hba->host->host_lock, flags);
-   for (tag = 0; tag < hba->host->can_queue; tag++) {
-   ccb = &hba->ccb[tag];
-   if (ccb->req == NULL)
-   continue;
-   ccb->req = NULL;
-   if (ccb->cmd) {
-   scsi_dma_unmap(ccb->cmd);
-   ccb->cmd->result = DID_RESET << 16;
-   ccb->cmd->scsi_done(ccb->cmd);
-   ccb->cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);
 
if (stex_handshake(hba) == 0)
return 0;
@@ -1771,9 +1784,11 @@ static void stex_remove(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);
 
+   hba->mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba->host);
 
-   stex_hba_stop(hba);
+   scsi_block_requests(hba->host);
 
stex_hba_free(hba);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 PATCH 3/3] scsi:stex.c Add S3/S4 support

2016-02-22 Thread Charles Chiou

From: Charles 

Add S3/S4 support, add .suspend and .resume function in pci_driver.
In .suspend handler, driver send S3/S4 signal to the device.

Signed-off-by: Charles Chiou 
Reviewed-by: Johannes Thumshirn 
---
 drivers/scsi/stex.c | 68 ++---
 1 file changed, 65 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 1994603..5b23175 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -167,6 +167,14 @@ enum {
 
ST_ADDITIONAL_MEM   = 0x20,
ST_ADDITIONAL_MEM_MIN   = 0x8,
+   PMIC_SHUTDOWN   = 0x0D,
+   PMIC_REUMSE = 0x10,
+   ST_IGNORED  = -1,
+   ST_NOTHANDLED   = 7,
+   ST_S3   = 3,
+   ST_S4   = 4,
+   ST_S5   = 5,
+   ST_S6   = 6,
 };
 
 struct st_sgitem {
@@ -1718,7 +1726,7 @@ out_disable:
return err;
 }
 
-static void stex_hba_stop(struct st_hba *hba)
+static void stex_hba_stop(struct st_hba *hba, int st_sleep_mic)
 {
struct req_msg *req;
struct st_msg_header *msg_h;
@@ -1727,6 +1735,15 @@ static void stex_hba_stop(struct st_hba *hba)
u16 tag = 0;
 
spin_lock_irqsave(hba->host->host_lock, flags);
+
+   if (hba->cardtype == st_yel && hba->supports_pm == 1)
+   {
+   if(st_sleep_mic == ST_NOTHANDLED)
+   {
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   return;
+   }
+   }
req = hba->alloc_rq(hba);
if (hba->cardtype == st_yel) {
msg_h = (struct st_msg_header *)req - 1;
@@ -1734,11 +1751,18 @@ static void stex_hba_stop(struct st_hba *hba)
} else
memset(req, 0, hba->rq_size);
 
-   if (hba->cardtype == st_yosemite || hba->cardtype == st_yel) {
+   if ((hba->cardtype == st_yosemite || hba->cardtype == st_yel)
+   && st_sleep_mic == ST_IGNORED) {
req->cdb[0] = MGT_CMD;
req->cdb[1] = MGT_CMD_SIGNATURE;
req->cdb[2] = CTLR_CONFIG_CMD;
req->cdb[3] = CTLR_SHUTDOWN;
+   } else if (hba->cardtype == st_yel && st_sleep_mic != ST_IGNORED) {
+   req->cdb[0] = MGT_CMD;
+   req->cdb[1] = MGT_CMD_SIGNATURE;
+   req->cdb[2] = CTLR_CONFIG_CMD;
+   req->cdb[3] = PMIC_SHUTDOWN;
+   req->cdb[4] = st_sleep_mic;
} else {
req->cdb[0] = CONTROLLER_CMD;
req->cdb[1] = CTLR_POWER_STATE_CHANGE;
@@ -1758,10 +1782,12 @@ static void stex_hba_stop(struct st_hba *hba)
while (hba->ccb[tag].req_type & PASSTHRU_REQ_TYPE) {
if (time_after(jiffies, before + ST_INTERNAL_TIMEOUT * HZ)) {
hba->ccb[tag].req_type = 0;
+   hba->mu_status = MU_STATE_STOP;
return;
}
msleep(1);
}
+   hba->mu_status = MU_STATE_STOP;
 }
 
 static void stex_hba_free(struct st_hba *hba)
@@ -1801,9 +1827,43 @@ static void stex_shutdown(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);
 
-   stex_hba_stop(hba);
+   if (hba->supports_pm == 0)
+   stex_hba_stop(hba, ST_IGNORED);
+   else
+   stex_hba_stop(hba, ST_S5);
+}
+
+static int stex_choice_sleep_mic(pm_message_t state)
+{
+   switch (state.event) {
+   case PM_EVENT_SUSPEND:
+   return ST_S3;
+   case PM_EVENT_HIBERNATE:
+   return ST_S4;
+   default:
+   return ST_NOTHANDLED;
+   }
 }
 
+static int stex_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+   struct st_hba *hba = pci_get_drvdata(pdev);
+
+   if (hba->cardtype == st_yel && hba->supports_pm == 1)
+   stex_hba_stop(hba, stex_choice_sleep_mic(state));
+   else
+   stex_hba_stop(hba, ST_IGNORED);
+   return 0;
+}
+
+static int stex_resume(struct pci_dev *pdev)
+{
+   struct st_hba *hba = pci_get_drvdata(pdev);
+
+   hba->mu_status = MU_STATE_STARTING;
+   stex_handshake(hba);
+   return 0;
+}
 MODULE_DEVICE_TABLE(pci, stex_pci_tbl);
 
 static struct pci_driver stex_pci_driver = {
@@ -1812,6 +1872,8 @@ static struct pci_driver stex_pci_driver = {
.probe  = stex_probe,
.remove = stex_remove,
.shutdown   = stex_shutdown,
+   .suspend= stex_suspend,
+   .resume = stex_resume,
 };
 
 static int __init stex_init(void)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
Mo

why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]

2016-02-22 Thread Mike Snitzer

On Mon, Feb 22 2016 at  4:51am -0500,
Junichi Nomura  wrote:

> On 02/20/16 15:12, Mike Snitzer wrote:
> > On Fri, Feb 19 2016 at  2:42pm -0500, Mike Snitzer  
> > wrote:
> >> Have you been running with blk-mq?
> >> Either by setting CONFIG_DM_MQ_DEFAULT or:
> >> echo Y > /sys/module/dm_mod/parameters/use_blk_mq
> >>
> >> I'm seeing test_02_sdev_delete fail with blk-mq enabled.
> > 
> > I only see failure if I stack dm-mq ontop of old non-mq scsi devices with:
> > 
> > echo N > /sys/module/scsi_mod/parameters/use_blk_mq
> > echo Y > /sys/module/dm_mod/parameters/use_blk_mq
> 
> Ah, I didn't test that combination. I can see the failure, too.
> 
> > But this makes me think the novelty of having dm-mq support stacking on
> > non-blk-mq devices was misplaced.  It is a senseless config.  I'll
> > probably remove support for such stacking soon (next week). 
> 
> Looking at the failure, I suspect it could be a common issue of dm-mq
> regardless of underlying device type.

In practice I'm not seeing any issues with dm-mq on scsi-mq.

> When requeueing, following calls happen in dm-mq:
>   dm_requeue_original_request() {
> ..
> blk_mq_requeue_request(rq);
> blk_mq_kick_requeue_list(rq->q);
> 
> then from block workqueue:
>   blk_mq_requeue_work() {
> ..
> blk_mq_start_hw_queue(q);
> 
> and blk_mq_start_hw_queue() re-starts the queue even if DM has
> stopped it for suspending. As a result, dm-mq ends up repeating
> submit-error-requeue forever and suspend never completes. Or,
> suspend somehow proceeds to clear DMF_NOFLUSH_SUSPENDING and
> I/O error may directly be returned to submitter.

I should note that I applied this patch for 4.6:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6

(but it was purely a fallout of code-review, and looking at the nvme's
use of blk_mq_requeue_request, I did't consider it to be a critical fix
or anything)

> Attached patch fixes the problem for DM. But given the code comment,
> there should be call sites which depend on 'start-if-stopped' behavior
> of blk_mq_requeue_work and we may need other solution.

Nice catch, it certainly does seem like the blk-mq requeue code is
undo-ing steps DM took to protect dm-mpath during suspend.  It likely
doesn't bite dm-mq on scsi-mq because in general blk-mq takes the
rq->q->queue_lock much less frequently.  But when stacking blk-mq on
.request_fn queues it causes live-lock you detailed above.

I'm not sure what the right fix is, but it would seem we need
something.  I cannot speak to why blk_mq_start_hw_queues() was used to
begin with (or why it is important for blk-mq to forcibly kicked stopped
queues on requeue).  Jens?

I see commit 8b95741569ea ("blk-mq: use blk_mq_start_hw_queues() when
running requeue work") but I'm still missing why the upper-layer driver
of the blk-mq queue (dm-mq in this case) isn't free to keep the queue
stopped.  This is pretty important for DM's suspend functionality.

> -- 
> Jun'ichi Nomura, NEC Corporation
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 56c0a72..bbfe936 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -481,11 +481,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
>   blk_mq_insert_request(rq, false, false, false);
>   }
>  
> - /*
> -  * Use the start variant of queue running here, so that running
> -  * the requeue work will kick stopped queues.
> -  */
> - blk_mq_start_hw_queues(q);
> + blk_mq_run_hw_queues(q, false);
>  }
>  
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
> 
> --
> dm-devel mailing list
> dm-de...@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv8 20/23] scsi: Add 'access_state' attribute

2016-02-22 Thread Bart Van Assche


On 02/21/16 22:59, Hannes Reinecke wrote:

The main reason why I need the 'access_state' attribute is to decouple
the multipath daemon; at the moment the multipath daemon has to issue
REPORT TARGET PORT GROUPS frequently to figure out the status, which is
causing quite some load on the target. When using the 'access_state'
attribute we would avoid doing I/O for that and have a consistent view,
both on the kernel and the multipath daemon side.

But it's actually a good thing to have the 'access_state' patch in a
different series; I've got some more patches converting the remaining
device_handler to also supply the 'access_state' values.


Hello Hannes,

The above sounds very interesting to me. Will multipathd recognize at 
run-time whether or not the kernel supports the sysfs ALUA state 
attribute ? Will ALUA state changes be reported through udev or will 
multipathd poll the sysfs ALUA state attributes ? And if the netlink 
buffer that is used in multipathd to receive udev events overflows 
(ENOBUFS), will multipathd resynchronize its state ? As far as I can see 
in source file libmultipath/uevent.c today multipathd ignores netlink 
buffer overflows.


Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ncr5380: Don't re-enter NCR5380_select() when aborting a command

2016-02-22 Thread Finn Thain


Please ignore this patch. It isn't sufficient to fix the problem. I'll 
send another patch that does fix it.

On Tue, 26 Jan 2016, Finn Thain wrote:

> Fixes: 707d62b37fbb ("ncr5380: Fix EH during arbitration and selection")
> Signed-off-by: Finn Thain 
> 
> ---
>  drivers/scsi/NCR5380.c   |2 +-
>  drivers/scsi/atari_NCR5380.c |2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux/drivers/scsi/NCR5380.c
> ===
> --- linux.orig/drivers/scsi/NCR5380.c 2016-01-26 13:31:10.0 +1100
> +++ linux/drivers/scsi/NCR5380.c  2016-01-26 13:31:10.0 +1100
> @@ -2337,7 +2337,7 @@ static int NCR5380_abort(struct scsi_cmn
>   dsprintk(NDEBUG_ABORT, instance,
>"abort: removed %p from disconnected list\n", cmd);
>   cmd->result = DID_ERROR << 16;
> - if (!hostdata->connected)
> + if (!hostdata->connected && !hostdata->selecting)
>   NCR5380_select(instance, cmd);
>   if (hostdata->connected != cmd) {
>   complete_cmd(instance, cmd);
> Index: linux/drivers/scsi/atari_NCR5380.c
> ===
> --- linux.orig/drivers/scsi/atari_NCR5380.c   2016-01-26 13:31:10.0 
> +1100
> +++ linux/drivers/scsi/atari_NCR5380.c2016-01-26 13:31:10.0 
> +1100
> @@ -2532,7 +2532,7 @@ static int NCR5380_abort(struct scsi_cmn
>   dsprintk(NDEBUG_ABORT, instance,
>"abort: removed %p from disconnected list\n", cmd);
>   cmd->result = DID_ERROR << 16;
> - if (!hostdata->connected)
> + if (!hostdata->connected && !hostdata->selecting)
>   NCR5380_select(instance, cmd);
>   if (hostdata->connected != cmd) {
>   complete_cmd(instance, cmd);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate

2016-02-22 Thread Finn Thain

This bug causes the wrong command to have its sense pointer overwritten,
which sometimes leads to a NULL pointer deref. Fix this by checking which
command is being requeued before restoring the scsi_eh_save data.

It turns out that some targets will disconnect a REQUEST SENSE command.
The autosense algorithm doesn't anticipate this. Hence multiple commands
can end up undergoing autosense simultaneously, and they will all try to
use the same scsi_eh_save struct, which won't work. Defer autosense when
the scsi_eh_save storage is in use by another command.

Fixes: f27db8eb98a1 ("ncr5380: Fix autosense bugs")
Reported-and-tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |4 ++--
 drivers/scsi/atari_NCR5380.c |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:07:01.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:02.0 +1100
@@ -760,7 +760,7 @@ static struct scsi_cmnd *dequeue_next_cm
struct NCR5380_cmd *ncmd;
struct scsi_cmnd *cmd;
 
-   if (list_empty(&hostdata->autosense)) {
+   if (hostdata->sensing || list_empty(&hostdata->autosense)) {
list_for_each_entry(ncmd, &hostdata->unissued, list) {
cmd = NCR5380_to_scmd(ncmd);
dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p 
target=%d busy=0x%02x lun=%llu\n",
@@ -793,7 +793,7 @@ static void requeue_cmd(struct Scsi_Host
struct NCR5380_hostdata *hostdata = shost_priv(instance);
struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
 
-   if (hostdata->sensing) {
+   if (hostdata->sensing == cmd) {
scsi_eh_restore_cmnd(cmd, &hostdata->ses);
list_add(&ncmd->list, &hostdata->autosense);
hostdata->sensing = NULL;
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:01.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:07:02.0 +1100
@@ -862,7 +862,7 @@ static struct scsi_cmnd *dequeue_next_cm
struct NCR5380_cmd *ncmd;
struct scsi_cmnd *cmd;
 
-   if (list_empty(&hostdata->autosense)) {
+   if (hostdata->sensing || list_empty(&hostdata->autosense)) {
list_for_each_entry(ncmd, &hostdata->unissued, list) {
cmd = NCR5380_to_scmd(ncmd);
dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p 
target=%d busy=0x%02x lun=%llu\n",
@@ -901,7 +901,7 @@ static void requeue_cmd(struct Scsi_Host
struct NCR5380_hostdata *hostdata = shost_priv(instance);
struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
 
-   if (hostdata->sensing) {
+   if (hostdata->sensing == cmd) {
scsi_eh_restore_cmnd(cmd, &hostdata->ses);
list_add(&ncmd->list, &hostdata->autosense);
hostdata->sensing = NULL;


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] ncr5380: Exception handling fixes for v4.5

2016-02-22 Thread Finn Thain


These patches fix some exception handling and autosense bugs that I
accidentally introduced in v4.5-rc1.

The error recovery and autosense code in these drivers has been unstable
for a long time. Despite that, v4.5-rc1 shows a regression in as much as
it exposes a bug in the aranym emulator. This leads to error recovery,
which can crash.

Also, Michael Schmitz reported some crashes involving abort handling
for a certain target device. And Dan Carpenter found a NULL pointer deref
in the new bus reset code.

Error recovery and autosense are stable with these patches.

I tested them using a Domex 3191D PCI card. Errors during IO were
simulated by sending bus resets and unplugging/replugging the SCSI
cables. Some of these patches fix bugs that only affect more capable
hardware (like Atari). Thanks to Michael Schmitz for patiently testing
those.

Please review this series for v4.5.

---
 drivers/scsi/NCR5380.c   |  133 +++
 drivers/scsi/atari_NCR5380.c |  133 +++
 2 files changed, 118 insertions(+), 148 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling

2016-02-22 Thread Finn Thain

Add missing checks for EH abort during arbitration and selection.
Rework the handling of NCR5380_select() result to improve clarity.

Fixes: 707d62b37fbb ("ncr5380: Fix EH during arbitration and selection")
Tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   16 +++-
 drivers/scsi/atari_NCR5380.c |   16 +++-
 2 files changed, 22 insertions(+), 10 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:07:00.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:01.0 +1100
@@ -815,15 +815,17 @@ static void NCR5380_main(struct work_str
struct NCR5380_hostdata *hostdata =
container_of(work, struct NCR5380_hostdata, main_task);
struct Scsi_Host *instance = hostdata->host;
-   struct scsi_cmnd *cmd;
int done;
 
do {
done = 1;
 
spin_lock_irq(&hostdata->lock);
-   while (!hostdata->connected &&
-  (cmd = dequeue_next_cmd(instance))) {
+   while (!hostdata->connected && !hostdata->selecting) {
+   struct scsi_cmnd *cmd = dequeue_next_cmd(instance);
+
+   if (!cmd)
+   break;
 
dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", 
cmd);
 
@@ -840,8 +842,7 @@ static void NCR5380_main(struct work_str
 * entire unit.
 */
 
-   cmd = NCR5380_select(instance, cmd);
-   if (!cmd) {
+   if (!NCR5380_select(instance, cmd)) {
dsprintk(NDEBUG_MAIN, instance, "main: select 
complete\n");
} else {
dsprintk(NDEBUG_MAIN | NDEBUG_QUEUES, instance,
@@ -1056,6 +1057,11 @@ static struct scsi_cmnd *NCR5380_select(
/* Reselection interrupt */
goto out;
}
+   if (!hostdata->selecting) {
+   /* Command was aborted */
+   NCR5380_write(MODE_REG, MR_BASE);
+   goto out;
+   }
if (err < 0) {
NCR5380_write(MODE_REG, MR_BASE);
shost_printk(KERN_ERR, instance,
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:00.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:07:01.0 +1100
@@ -923,7 +923,6 @@ static void NCR5380_main(struct work_str
struct NCR5380_hostdata *hostdata =
container_of(work, struct NCR5380_hostdata, main_task);
struct Scsi_Host *instance = hostdata->host;
-   struct scsi_cmnd *cmd;
int done;
 
/*
@@ -936,8 +935,11 @@ static void NCR5380_main(struct work_str
done = 1;
 
spin_lock_irq(&hostdata->lock);
-   while (!hostdata->connected &&
-  (cmd = dequeue_next_cmd(instance))) {
+   while (!hostdata->connected && !hostdata->selecting) {
+   struct scsi_cmnd *cmd = dequeue_next_cmd(instance);
+
+   if (!cmd)
+   break;
 
dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", 
cmd);
 
@@ -960,8 +962,7 @@ static void NCR5380_main(struct work_str
 #ifdef SUPPORT_TAGS
cmd_get_tag(cmd, cmd->cmnd[0] != REQUEST_SENSE);
 #endif
-   cmd = NCR5380_select(instance, cmd);
-   if (!cmd) {
+   if (!NCR5380_select(instance, cmd)) {
dsprintk(NDEBUG_MAIN, instance, "main: select 
complete\n");
maybe_release_dma_irq(instance);
} else {
@@ -1257,6 +1258,11 @@ static struct scsi_cmnd *NCR5380_select(
/* Reselection interrupt */
goto out;
}
+   if (!hostdata->selecting) {
+   /* Command was aborted */
+   NCR5380_write(MODE_REG, MR_BASE);
+   goto out;
+   }
if (err < 0) {
NCR5380_write(MODE_REG, MR_BASE);
shost_printk(KERN_ERR, instance,


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset

2016-02-22 Thread Finn Thain

Commands subject to exception handling are to be returned to the scsi
mid-layer. Make sure that the various command pointers and command lists
in the low-level driver are correctly cleansed of affected commands.

This fixes some bugs that I accidentally introduced in v4.5-rc1 including
the removal of INIT_LIST_HEAD for the 'autosense' and 'disconnected'
command lists, and the possible NULL pointer dereference in
NCR5380_bus_reset() that was reported by Dan Carpenter.

hostdata->sensing may also point to an affected command so this pointer
also has to be cleared. The abort handler calls complete_cmd() to take
care of this; let's have the bus reset handler do the same.

The issue queue may also contain an affected command. If so, remove it.
This also follows the abort handler logic.

Reported-by: Dan Carpenter 
Fixes: 62717f537e1b ("ncr5380: Implement new eh_bus_reset_handler")
Tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   19 ---
 drivers/scsi/atari_NCR5380.c |   19 ---
 2 files changed, 24 insertions(+), 14 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:56.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:56.0 +1100
@@ -2450,7 +2450,16 @@ static int NCR5380_bus_reset(struct scsi
 * commands!
 */
 
-   hostdata->selecting = NULL;
+   if (list_del_cmd(&hostdata->unissued, cmd)) {
+   cmd->result = DID_RESET << 16;
+   cmd->scsi_done(cmd);
+   }
+
+   if (hostdata->selecting) {
+   hostdata->selecting->result = DID_RESET << 16;
+   complete_cmd(instance, hostdata->selecting);
+   hostdata->selecting = NULL;
+   }
 
list_for_each_entry(ncmd, &hostdata->disconnected, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2458,6 +2467,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->disconnected);
 
list_for_each_entry(ncmd, &hostdata->autosense, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2465,6 +2475,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->autosense);
 
if (hostdata->connected) {
set_host_byte(hostdata->connected, DID_RESET);
@@ -2472,12 +2483,6 @@ static int NCR5380_bus_reset(struct scsi
hostdata->connected = NULL;
}
 
-   if (hostdata->sensing) {
-   set_host_byte(hostdata->connected, DID_RESET);
-   complete_cmd(instance, hostdata->sensing);
-   hostdata->sensing = NULL;
-   }
-
for (i = 0; i < 8; ++i)
hostdata->busy[i] = 0;
 #ifdef REAL_DMA
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:06:56.0 +1100
@@ -2646,7 +2646,16 @@ static int NCR5380_bus_reset(struct scsi
 * commands!
 */
 
-   hostdata->selecting = NULL;
+   if (list_del_cmd(&hostdata->unissued, cmd)) {
+   cmd->result = DID_RESET << 16;
+   cmd->scsi_done(cmd);
+   }
+
+   if (hostdata->selecting) {
+   hostdata->selecting->result = DID_RESET << 16;
+   complete_cmd(instance, hostdata->selecting);
+   hostdata->selecting = NULL;
+   }
 
list_for_each_entry(ncmd, &hostdata->disconnected, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2654,6 +2663,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->disconnected);
 
list_for_each_entry(ncmd, &hostdata->autosense, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2661,6 +2671,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->autosense);
 
if (hostdata->connected) {
set_host_byte(hostdata->connected, DID_RESET);
@@ -2668,12 +2679,6 @@ static int NCR5380_bus_reset(struct scsi
hostdata->connected = NULL;
}
 
-   if (hostdata->sensing) {
-   set_host_byte(hostdata->connected, DID_RESET);
-   complete_cmd(instance, hostdata->sensing);
-   hostdata->sensing = NULL;
-   }
-
 #ifdef SUPPORT_TAGS
free_all_tags(hostdata);
 #endif


--
To unsu

[PATCH 3/6] ncr5380: Dont re-enter NCR5380_select()

2016-02-22 Thread Finn Thain

Calling NCR5380_select() from the abort handler causes various problems.
Firstly, it means potentially re-entering NCR5380_select(). Secondly, it
means that the lock is released, which permits the EH handlers to be
re-entered. The combination results in crashes. Don't do it.

Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler")
Reported-and-tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   16 
 drivers/scsi/atari_NCR5380.c |   16 
 2 files changed, 16 insertions(+), 16 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:57.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:58.0 +1100
@@ -2302,6 +2302,9 @@ static bool list_del_cmd(struct list_hea
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
  * If the command has not completed yet, we must not fail to find it.
+ *
+ * The lock protects driver data structures, but EH handlers also use it
+ * to serialize their own execution and prevent their own re-entry.
  */
 
 static int NCR5380_abort(struct scsi_cmnd *cmd)
@@ -2338,14 +2341,11 @@ static int NCR5380_abort(struct scsi_cmn
if (list_del_cmd(&hostdata->disconnected, cmd)) {
dsprintk(NDEBUG_ABORT, instance,
 "abort: removed %p from disconnected list\n", cmd);
-   cmd->result = DID_ERROR << 16;
-   if (!hostdata->connected)
-   NCR5380_select(instance, cmd);
-   if (hostdata->connected != cmd) {
-   complete_cmd(instance, cmd);
-   result = FAILED;
-   goto out;
-   }
+   /* Can't call NCR5380_select() and send ABORT because that
+* means releasing the lock. Need a bus reset.
+*/
+   result = FAILED;
+   goto out;
}
 
if (hostdata->connected == cmd) {
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:57.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:06:58.0 +1100
@@ -2497,6 +2497,9 @@ static bool list_del_cmd(struct list_hea
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
  * If the command has not completed yet, we must not fail to find it.
+ *
+ * The lock protects driver data structures, but EH handlers also use it
+ * to serialize their own execution and prevent their own re-entry.
  */
 
 static int NCR5380_abort(struct scsi_cmnd *cmd)
@@ -2533,14 +2536,11 @@ static int NCR5380_abort(struct scsi_cmn
if (list_del_cmd(&hostdata->disconnected, cmd)) {
dsprintk(NDEBUG_ABORT, instance,
 "abort: removed %p from disconnected list\n", cmd);
-   cmd->result = DID_ERROR << 16;
-   if (!hostdata->connected)
-   NCR5380_select(instance, cmd);
-   if (hostdata->connected != cmd) {
-   complete_cmd(instance, cmd);
-   result = FAILED;
-   goto out;
-   }
+   /* Can't call NCR5380_select() and send ABORT because that
+* means releasing the lock. Need a bus reset.
+*/
+   result = FAILED;
+   goto out;
}
 
if (hostdata->connected == cmd) {


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] ncr5380: Forget aborted commands

2016-02-22 Thread Finn Thain

The list structures and related logic used in the NCR5380 driver mean that
a command cannot be queued twice (i.e. can't appear on more than one queue
and can't appear on the same queue more than once).

The abort handler must forget the command so that the mid-layer can re-use
it. E.g. the ML may send it back to the LLD via via scsi_eh_get_sense().

Fix this and also fix two error paths, so that commands get forgotten iff
completed.

Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler")
Tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   62 +++
 drivers/scsi/atari_NCR5380.c |   62 +++
 2 files changed, 34 insertions(+), 90 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:58.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:00.0 +1100
@@ -1796,6 +1796,7 @@ static void NCR5380_information_transfer
do_abort(instance);
cmd->result = DID_ERROR << 16;
complete_cmd(instance, cmd);
+   hostdata->connected = NULL;
return;
 #endif
case PHASE_DATAIN:
@@ -1845,7 +1846,6 @@ static void NCR5380_information_transfer
sink = 1;
do_abort(instance);
cmd->result = DID_ERROR << 16;
-   complete_cmd(instance, cmd);
/* XXX - need to source or sink 
data here, as appropriate */
} else
cmd->SCp.this_residual -= 
transfersize - len;
@@ -2294,14 +2294,14 @@ static bool list_del_cmd(struct list_hea
  * [disconnected -> connected ->]...
  * [autosense -> connected ->] done
  *
- * If cmd is unissued then just remove it.
- * If cmd is disconnected, try to select the target.
- * If cmd is connected, try to send an abort message.
- * If cmd is waiting for autosense, give it a chance to complete but check
- * that it isn't left connected.
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
+ *
  * If the command has not completed yet, we must not fail to find it.
+ * We have no option but to forget the aborted command (even if it still
+ * lacks sense data). The mid-layer may re-issue a command that is in error
+ * recovery (see scsi_send_eh_cmnd), but the logic and data structures in
+ * this driver are such that a command can appear on one queue only.
  *
  * The lock protects driver data structures, but EH handlers also use it
  * to serialize their own execution and prevent their own re-entry.
@@ -2327,6 +2327,7 @@ static int NCR5380_abort(struct scsi_cmn
 "abort: removed %p from issue queue\n", cmd);
cmd->result = DID_ABORT << 16;
cmd->scsi_done(cmd); /* No tag or busy flag to worry about */
+   goto out;
}
 
if (hostdata->selecting == cmd) {
@@ -2344,6 +2345,8 @@ static int NCR5380_abort(struct scsi_cmn
/* Can't call NCR5380_select() and send ABORT because that
 * means releasing the lock. Need a bus reset.
 */
+   set_host_byte(cmd, DID_ERROR);
+   complete_cmd(instance, cmd);
result = FAILED;
goto out;
}
@@ -2351,45 +2354,9 @@ static int NCR5380_abort(struct scsi_cmn
if (hostdata->connected == cmd) {
dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is 
connected\n", cmd);
hostdata->connected = NULL;
-   if (do_abort(instance)) {
-   set_host_byte(cmd, DID_ERROR);
-   complete_cmd(instance, cmd);
-   result = FAILED;
-   goto out;
-   }
-   set_host_byte(cmd, DID_ABORT);
 #ifdef REAL_DMA
hostdata->dma_len = 0;
 #endif
-   if (cmd->cmnd[0] == REQUEST_SENSE)
-   complete_cmd(instance, cmd);
-   else {
-   struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
-
-   /* Perform autosense for this command */
-   list_add(&ncmd->list, &hostdata->autosense);
-   }
-   }
-
-   if (list_find_cmd(&hostdata->autosense, cmd)) {
-   dsprintk(NDEBUG_ABORT, instance,
-"abort: found %p on sense queue\n", cmd);
-   spin_unlock_irqrestore(&hostdata->lock,

[PATCH 2/6] ncr5380: Dont release lock for PIO transfer

2016-02-22 Thread Finn Thain

The calls to NCR5380_transfer_pio() for DATA IN and DATA OUT phases will
modify cmd->SCp.this_residual, cmd->SCp.ptr and cmd->SCp.buffer. That
works as long as EH does not intervene, which became possible in
atari_NCR5380.c when I changed the locking to bring it closer to
NCR5380.c.

If error recovery aborts the command, the scsi_cmnd in question and its
buffer will be returned to the mid-layer. So the transfer has to cease,
but it can't be stopped by the initiator because the target controls the
bus phase.

The problem does not arise if the lock is not released. That was fine for
atari_scsi, because it implements DMA. For the other drivers, we have to
release the lock and re-enable interrupts for long PIO data transfers.

The solution is to split the transfer into small chunks. In between chunks
the main loop releases the lock and re-enables interrupts. Thus interrupts
can be serviced and eh_bus_reset_handler can intervene if need be.

This fixes an oops in NCR5380_transfer_pio() that can happen when the EH
abort handler is invoked during DATA IN or DATA OUT phase.

Fixes: 11d2f63b9cf5 ("ncr5380: Change instance->host_lock to hostdata->lock")
Reported-and-tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   16 +---
 drivers/scsi/atari_NCR5380.c |   16 +---
 2 files changed, 18 insertions(+), 14 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:56.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:57.0 +1100
@@ -1759,9 +1759,7 @@ static void NCR5380_information_transfer
unsigned char msgout = NOP;
int sink = 0;
int len;
-#if defined(PSEUDO_DMA) || defined(REAL_DMA_POLL)
int transfersize;
-#endif
unsigned char *data;
unsigned char phase, tmp, extended_msg[10], old_phase = 0xff;
struct scsi_cmnd *cmd;
@@ -1854,13 +1852,17 @@ static void NCR5380_information_transfer
} else
 #endif /* defined(PSEUDO_DMA) || 
defined(REAL_DMA_POLL) */
{
-   spin_unlock_irq(&hostdata->lock);
-   NCR5380_transfer_pio(instance, &phase,
-(int 
*)&cmd->SCp.this_residual,
+   /* Break up transfer into 3 ms chunks,
+* presuming 6 accesses per handshake.
+*/
+   transfersize = min((unsigned 
long)cmd->SCp.this_residual,
+  
hostdata->accesses_per_ms / 2);
+   len = transfersize;
+   NCR5380_transfer_pio(instance, &phase, 
&len,
 (unsigned char 
**)&cmd->SCp.ptr);
-   spin_lock_irq(&hostdata->lock);
+   cmd->SCp.this_residual -= transfersize 
- len;
}
-   break;
+   return;
case PHASE_MSGIN:
len = 1;
data = &tmp;
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:06:57.0 +1100
@@ -1838,9 +1838,7 @@ static void NCR5380_information_transfer
unsigned char msgout = NOP;
int sink = 0;
int len;
-#if defined(REAL_DMA)
int transfersize;
-#endif
unsigned char *data;
unsigned char phase, tmp, extended_msg[10], old_phase = 0xff;
struct scsi_cmnd *cmd;
@@ -1983,18 +1981,22 @@ static void NCR5380_information_transfer
} else
 #endif /* defined(REAL_DMA) */
{
-   spin_unlock_irq(&hostdata->lock);
-   NCR5380_transfer_pio(instance, &phase,
-(int 
*)&cmd->SCp.this_residual,
+   /* Break up transfer into 3 ms chunks,
+* presuming 6 accesses per handshake.
+*/
+   transfersize = min((unsigned 
long)cmd->SCp.this_residual,
+  
hostdata->accesses_per_ms / 2);
+   len = transfersize;
+

Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]

2016-02-22 Thread Alexandre Rossi

Hello,

>> > As this is Linux 4.3 and not 4.4, I guess this is a different problem
>> > though. Alexandre, where you able to capture the stack trace? I’d submit
>> > a new bug report with this.
>>
>> Here is a photo. Please ping me if you need to test some debugging patches.
>
> It looks like the problem occurs in blk_post_runtime_resume().  Since
> there have been recent changes to this routine, it's hard to tell
> whether you're using the most up-to-date code.
>
> In particular, the first few lines of blk_post_runtime_resume() in
> block/blk-core.c should look like this:
>
> void blk_post_runtime_resume(struct request_queue *q, int err)
> {
> if (!q->dev)
> return;
>
> The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer
> dereference in runtime PM"), which was added to the mainline kernel
> between 4.3 and 4.4.  I don't know what the commit ID would be for a
> .stable kernel.

Okay now I've tried with 4.4. The oops does not occur. So this is
fixed for me in 4.4.

If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix
crashes in sd and sr runtime PM") is not enough to backport. Something
in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer
dereference in runtime PM") is also needed.

Thanks a lot,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Announce] sg3_utils-1.42 available

2016-02-22 Thread Bart Van Assche


On 02/18/2016 11:52 AM, Douglas Gilbert wrote:

On 16-02-17 11:59 PM, Douglas Gilbert wrote:

sg3_utils is a package of command line utilities for sending
SCSI and some ATA commands to devices. This package targets
the Linux 4, 3, 2.6 and 2.4 kernel series. It has ports to
FreeBSD, Tru64, Solaris, and Windows (cygwin and MinGW).

There are two new utilities (sg_read_attr and sg_timestamp)
and additions to many others, see the ChangeLog below. This
version tracks various changes made by www.t10.org since May
2015 until January 2016.


Missed the links:
For an overview of sg3_utils and downloads see this page:
 http://sg.danny.cz/sg/sg3_utils.html
The sg_ses utility (for enclosure devices) is discussed at:
 http://sg.danny.cz/sg/sg_ses.html
A full changelog can be found at:
 http://sg.danny.cz/sg/p/sg3_utils.ChangeLog


Hi Doug,

Thanks for all the work you have done for maintaining sg3_utils and also 
for having prepared a new release. I have already downloaded version 
v1.42 and started using that version. The detailed changelog is helpful. 
However, I think for sg3_utils contributors it would be convenient to 
have access to the sg3_utils source code repository such that we can see 
all the patches that went in. Is such a repository publicly available, 
and if not, do you have any plans to make such a repository available ? 
Since I have a few patches ready that I would like to contribute to the 
sg3_utils package, is there a mailing list that I should CC when sending 
these patches to you ?


Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]

2016-02-22 Thread Junichi Nomura

On 02/23/16 00:09, Mike Snitzer wrote:
> I should note that I applied this patch for 4.6:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6
> 
> (but it was purely a fallout of code-review, and looking at the nvme's
> use of blk_mq_requeue_request, I did't consider it to be a critical fix
> or anything)

The patch above contains following change:

> +static void dm_mq_requeue_request(struct request *rq)
> +{
> + struct request_queue *q = rq->q;
> + unsigned long flags;
> +
> + blk_mq_requeue_request(rq);
> + spin_lock_irqsave(q->queue_lock, flags);
> + if (!blk_queue_stopped(q))
> + blk_mq_kick_requeue_list(q);
> + spin_unlock_irqrestore(q->queue_lock, flags);
> +}

If you make it conditional to call blk_mq_kick_requeue_list() here,
I think we have to call the function from start_queue(), too,
otherwise requeued requests might stay forever in q->requeue_list.

-- 
Jun'ichi Nomura, NEC Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages

2016-02-22 Thread Damien Le Moal


Hello,

I would like to attend LSF/MM 2016 to discuss the following topics.

1) Online Logical Head Depop

Some disk drives available on the market already provide a "logical
depop" function which allows a system to decommission a defective
disk head, reformat the disk and continue using this same disk with
a reduced capacity. Such feature can allow reduced operation costs
(delayed HDD replacement) but has the drawback of a data loss (data
under the remaining valid heads) and disk downtime during re-formating.

Online logical head depop is a proposed new feature allowing retaining
the disk valid data and eliminating the need for a disk re-format.
The basic idea is to introduce new commands for the host to discover
the ranges of LBAs impacted by a defective head. Using this information,
the host can take actions when a disk head failure event is suspected
or reported:
(a) The impacted LBAs can be depopulated, resulting in the disk
operating as a “thin provisioned” device.
(b) The impacted LBAs can be amputated, resulting in error for all
subsequent accesses to the LBAs under the defective head.
(c) Optionally, a host may decide to reformat (compact) the disk to
restore operation as a fully-provisioned device with a lower capacity.

The goal of the discussion would be to gather the opinion of the
developers for drafting a command standard minimizing the impact of this
feature on the block I/O stack as well as allowing a simple use of this
feature by file systems and device mapper drivers (including logical
volume manager).


2) Write back of dirty pages to SMR block devices:

Dirty pages of a block device inode are currently processed using the
generic_writepages function, which can be executed simultaneously
by multiple contexts (e.g sync, fsync, msync, sync_file_range, etc).
Mutual exclusion of the dirty page processing being achieved only at
the page level (page lock & page writeback flag), multiple processes
executing a "sync" of overlapping block ranges over the same zone of
an SMR disk can cause an out-of-LBA-order sequence of write requests
being sent to the underlying device. On a host managed SMR disk, where
sequential write to disk zones is mandatory, this result in errors and
the impossibility for an application using raw sequential disk write
accesses to be guaranteed successful completion of its write or fsync
requests.

Using the zone information attached to the SMR block device queue
(introduced by Hannes), calls to the generic_writepages function can
be made mutually exclusive on a per zone basis by locking the zones.
This guarantees sequential request generation for each zone and avoid
write errors without any modification to the generic code implementing
generic_writepages.

This is but one possible solution for supporting SMR host-managed
devices without any major rewrite of page cache management and
write-back processing. The opinion of the audience regarding this
solution and discussing other potential solutions would be greatly
appreciated.

Thank you.

Best regards.



Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital company
damien.lem...@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa, 
Kanagawa, 252-0888 Japan
www.hgst.com 
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality 
Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or 
legally privileged information of WDC and/or its affiliates, and are intended 
solely for the use of the individual or entity to which they are addressed. If 
you are not the intended recipient, any disclosure, copying, distribution or 
any action taken or omitted to be taken in reliance on it, is prohibited. If 
you have received this e-mail in error, please notify the sender immediately 
and delete the e-mail in its entirety from your system.

Re: why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]

2016-02-22 Thread Mike Snitzer

On Mon, Feb 22 2016 at  8:34pm -0500,
Junichi Nomura  wrote:

> On 02/23/16 00:09, Mike Snitzer wrote:
> > I should note that I applied this patch for 4.6:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6
> > 
> > (but it was purely a fallout of code-review, and looking at the nvme's
> > use of blk_mq_requeue_request, I did't consider it to be a critical fix
> > or anything)
> 
> The patch above contains following change:
> 
> > +static void dm_mq_requeue_request(struct request *rq)
> > +{
> > +   struct request_queue *q = rq->q;
> > +   unsigned long flags;
> > +
> > +   blk_mq_requeue_request(rq);
> > +   spin_lock_irqsave(q->queue_lock, flags);
> > +   if (!blk_queue_stopped(q))
> > +   blk_mq_kick_requeue_list(q);
> > +   spin_unlock_irqrestore(q->queue_lock, flags);
> > +}
> 
> If you make it conditional to call blk_mq_kick_requeue_list() here,
> I think we have to call the function from start_queue(), too,
> otherwise requeued requests might stay forever in q->requeue_list.

Yes, you're right.  Fixed up and pushed to rebased linux-dm.git 'dm-4.6'
branch:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=818c5f3bef750eb5998b468f84391e4d656b97ed
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages

2016-02-22 Thread Bart Van Assche


On 02/22/16 18:56, Damien Le Moal wrote:

2) Write back of dirty pages to SMR block devices:

Dirty pages of a block device inode are currently processed using the
generic_writepages function, which can be executed simultaneously
by multiple contexts (e.g sync, fsync, msync, sync_file_range, etc).
Mutual exclusion of the dirty page processing being achieved only at
the page level (page lock & page writeback flag), multiple processes
executing a "sync" of overlapping block ranges over the same zone of
an SMR disk can cause an out-of-LBA-order sequence of write requests
being sent to the underlying device. On a host managed SMR disk, where
sequential write to disk zones is mandatory, this result in errors and
the impossibility for an application using raw sequential disk write
accesses to be guaranteed successful completion of its write or fsync
requests.

Using the zone information attached to the SMR block device queue
(introduced by Hannes), calls to the generic_writepages function can
be made mutually exclusive on a per zone basis by locking the zones.
This guarantees sequential request generation for each zone and avoid
write errors without any modification to the generic code implementing
generic_writepages.

This is but one possible solution for supporting SMR host-managed
devices without any major rewrite of page cache management and
write-back processing. The opinion of the audience regarding this
solution and discussing other potential solutions would be greatly
appreciated.


Hello Damien,

Is it sufficient to support filesystems like BTRFS on top of SMR drives 
or would you also like to see that filesystems like ext4 can use SMR 
drives ? In the latter case: the behavior of SMR drives differs so 
significantly from that of other block devices that I'm not sure that we 
should try to support these directly from infrastructure like the page 
cache. If we look e.g. at NAND SSDs then we see that the characteristics 
of NAND do not match what filesystems expect (e.g. large erase blocks). 
That is why every SSD vendor provides an FTL (Flash Translation Layer), 
either inside the SSD or as a separate software driver. An FTL 
implements a so-called LFS (log-structured filesystem). With what I know 
about SMR this technology looks also suitable for implementation of a 
LFS. Has it already been considered to implement an LFS driver for SMR 
drives ? That would make it possible for any filesystem to access an SMR 
drive as any other block device. I'm not sure of this but maybe it will 
be possible to share some infrastructure with the LightNVM driver 
(directory drivers/lightnvm in the Linux kernel tree). This driver 
namely implements an FTL.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages

2016-02-22 Thread Damien Le Moal

>On 02/22/16 18:56, Damien Le Moal wrote:
>> 2) Write back of dirty pages to SMR block devices:
>>
>> Dirty pages of a block device inode are currently processed using the
>> generic_writepages function, which can be executed simultaneously
>> by multiple contexts (e.g sync, fsync, msync, sync_file_range, etc).
>> Mutual exclusion of the dirty page processing being achieved only at
>> the page level (page lock & page writeback flag), multiple processes
>> executing a "sync" of overlapping block ranges over the same zone of
>> an SMR disk can cause an out-of-LBA-order sequence of write requests
>> being sent to the underlying device. On a host managed SMR disk, where
>> sequential write to disk zones is mandatory, this result in errors and
>> the impossibility for an application using raw sequential disk write
>> accesses to be guaranteed successful completion of its write or fsync
>> requests.
>>
>> Using the zone information attached to the SMR block device queue
>> (introduced by Hannes), calls to the generic_writepages function can
>> be made mutually exclusive on a per zone basis by locking the zones.
>> This guarantees sequential request generation for each zone and avoid
>> write errors without any modification to the generic code implementing
>> generic_writepages.
>>
>> This is but one possible solution for supporting SMR host-managed
>> devices without any major rewrite of page cache management and
>> write-back processing. The opinion of the audience regarding this
>> solution and discussing other potential solutions would be greatly
>> appreciated.
>
>Hello Damien,
>
>Is it sufficient to support filesystems like BTRFS on top of SMR drives 
>or would you also like to see that filesystems like ext4 can use SMR 
>drives ? In the latter case: the behavior of SMR drives differs so 
>significantly from that of other block devices that I'm not sure that we 
>should try to support these directly from infrastructure like the page 
>cache. If we look e.g. at NAND SSDs then we see that the characteristics 
>of NAND do not match what filesystems expect (e.g. large erase blocks). 
>That is why every SSD vendor provides an FTL (Flash Translation Layer), 
>either inside the SSD or as a separate software driver. An FTL 
>implements a so-called LFS (log-structured filesystem). With what I know 
>about SMR this technology looks also suitable for implementation of a 
>LFS. Has it already been considered to implement an LFS driver for SMR 
>drives ? That would make it possible for any filesystem to access an SMR 
>drive as any other block device. I'm not sure of this but maybe it will 
>be possible to share some infrastructure with the LightNVM driver 
>(directory drivers/lightnvm in the Linux kernel tree). This driver 
>namely implements an FTL.

Hello Bart,

Thank you for your comments.

I totally agree with you that trying to support SMR disks by only modifying
the page cache so that unmodified standard file systems like BTRFS or ext4
remain operational is not realistic at best, and more likely simply impossible.
For this kind of use case, as you said, an FTL or a device mapper driver are
much more suitable.

The case I am considering for this discussion is for raw block device accesses
by an application (writes from user space to /dev/sdxx). This is a very likely
use case scenario for high capacity SMR disks with applications like distributed
object stores / key value stores.

In this case, write-back of dirty pages in the block device file inode mapping
is handled in fs/block_dev.c using the generic helper function 
generic_writepages.
This does not guarantee the generation of the required sequential write pattern
per zone necessary for host-managed disks. As I explained, aligning calls of 
this
function to zone boundaries while locking the zones under write-back solves
simply the problem (implemented and tested). This is of course only one possible
solution. Pushing modifications deeper in the code or providing a
"generic_sequential_writepages" helper function are other potential solutions
that in my opinion are worth discussing as other types of devices may benefit 
also
in terms of performance (e.g. regular disk drives prefer sequential writes, and
SSDs as well) and/or lighten the overhead on an underlying FTL or device mapper
driver.

For a file system, an SMR compliant implementation of a file inode mapping
writepages method should be provided by the file system itself as the 
sequentiality
of the write pattern depends further on the block allocation mechanism of the 
file
system.

Note that the goal here is not to hide to applications the sequential write
constraint of SMR disks. The page cache itself (the mapping of the block
device inode) remains unchanged. But the modification proposed guarantees that
a well behaved application writing sequentially to zones through the page cache
will see successful sync operations.

Best regards.

Damien Le Moal, Ph.D.
Sr. Manager, System S

Re: dm-multipath test scripts

Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()

[Bug 111441] iscsi fails to attach to targets

RE: [Bug 111441] New: iscsi fails to attach to targets

RE: [PATCH 0/2][RESEND] scsi_transport_fc: LUN masking

[v2 PATCH 1/3] scsi:stex.c Support to Pegasus series.

[v2 PATCH 2/3] scsi:stex.c Add hotplug support

[v2 PATCH 3/3] scsi:stex.c Add S3/S4 support

why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]

Re: [PATCHv8 20/23] scsi: Add 'access_state' attribute

Re: [PATCH] ncr5380: Don't re-enter NCR5380_select() when aborting a command

[PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate

[PATCH 0/6] ncr5380: Exception handling fixes for v4.5

[PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling

[PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset

[PATCH 3/6] ncr5380: Dont re-enter NCR5380_select()

[PATCH 4/6] ncr5380: Forget aborted commands

[PATCH 2/6] ncr5380: Dont release lock for PIO transfer

Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]

Re: [Announce] sg3_utils-1.42 available

Re: why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]

[LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages

Re: why is blk-mq requeue foricbly kicking stopped queues? [was: Re: dm-multipath test scripts]

Re: [LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages

Re: [LSF/MM ATTEND] Online Logical Head Depop and SMR disks chunked writepages

25 matches

Site Navigation

Mail list logo

Footer information