Re: mdadm's raid1 will not eliminate abnormal disk after 5 seconds under IO pressure

2019-01-31 Thread 李春
Ok, thanks.

Xiao Ni  于2019年1月31日周四 下午2:25写道:
>
> Yes, the failfast is used to fix the problem you described. It can't remove
>
> the active disk until all pending I/O finish without failfast. If there
> is no
>
> pending I/O, it can be removed immediately.
>
> Thanks
>
> Xiao
>
>
> On 01/30/2019 10:14 PM, 李春 wrote:
> > I have read the description of the failfast feature. According to the
> > phenomenon, it may  be not the problem of failfast.
> > Because when there are no io pressure, after stop the disk export on
> > the storage node,  the disk will be automatically eliminate from the
> > md disk.
> > However, if there is continuous IO pressure, the disk will not be
> > automatically removed, and the disk will be eliminated immediately
> > after the IO pressure is stopped.
> >
> > Xiao Ni  于2019年1月30日周三 下午5:15写道:
> >>
> >>
> >> On 01/30/2019 03:25 PM, Jack Wang wrote:
> >>> 李春  于2019年1月30日周三 上午7:08写道:
>  # Description of problem:
>  We loaded a disk from two network of storage node via iscsi, merged
>  into a disk through multipath, and made a raid1 with  local disk by
>  mdadm.
>  However, when the storage machine of iscsi disk rebooted,  raid1 disk
>  does not automatically eject the abnormal disk when there are some IO
>  pressure.
> 
>  # Version-Release number of selected component (if applicable):
>  vermagic: 2.6.32-573.el6.x86_64 SMP mod_unload modversions
>  srcversion: 39AAB97325332236F2FFCA9
> 
>  # How reproducible:
>  always
> 
>  # Steps to Reproduce:
>  1. export a disk from storage node
>  2. load the disk on another node and merge it with multipath
>  3. assemble a local disk and the multipath by madm to a raid1 disk
>  4. reboot
> 
>  # Actual results:
>  * multipath disk not eject from raid1 disk under Fio pressure
>  * multipath disk eject immediately from raid1 disk when stop Fio pressure
> 
>  # Expected results:
>  * multipath disk eject immediately from raid1 disk under Fio pressure
> 
>  # Additional info:
>  We have done the following tests:
>  * In rhel6.7 with kernel of 2.6.32-573.el6.x86_64 test, mdadm's raid1
>  will eliminate the abnormal disk after 5 seconds without IO pressure
>  * In rhel6.7 with kernel of 2.6.32-573.el6.x86_64 test, in the case of
>  IO pressure, mdadm's raid1 will not reject the abnormal disk, until
>  the IO pressure stops, the disk will be removed.
>  * In rhel7.4 with kernel of 3.10.0-693.el7.x86_64 test, mdadm's raid1
>  will eliminate the abnormal disk after 5 seconds without IO pressure
>  * In rhel7.4 with kernel of  3.10.0-693.el7.x86_64 test, mdadm's raid1
>  will eliminate abnormal disk after 5 seconds under IO pressure
> 
>  Thanks for your help.
> >>> Sounds like, you want failfast feature in upstream, not sure if RH
> >>> backport it into their kernel.
> >> Thanks for the reporting and analysis.
> >> rhel6 is in the period that it's recommended to fix bugs only. So it
> >> doesn't backport some features.
> >> I'll have a try to backport this to rhel6.
> >>
> >> Regards
> >> Xiao
> >
> >
>


-- 
李春 Pickup Li
产品研发部  首席架构师

www.woqutech.com
杭州沃趣科技股份有限公司

杭州市滨江区滨安路1190号智汇中心A座1004室  310052
Hangzhou WOQU Technology Co., Ltd.
Room 1004, Building A, D-innovation Center, No. 1190, Bin' an road,
Hangzhou 310052


T:(0571) 87770835
M:(86)18989451982
F:(0571) 86805750
E:pickup...@woqutech.com


Re: [PATCH v2 1/7] scsi: libsas: reset the negotiated_linkrate when phy is down

2019-01-31 Thread John Garry

On 31/01/2019 01:11, Jason Yan wrote:



On 2019/1/30 21:08, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

If the device is unplugged or disconnected, the negotiated_linkrate
still can be seen from the userspace by sysfs. This makes people
confused and leaks information of the device last used. So let's reset
the negotiated_linkrate after the phy is down.

Signed-off-by: Jason Yan 
CC: John Garry 
CC: Johannes Thumshirn 
CC: Ewan Milne 
CC: Christoph Hellwig 
CC: Tomas Henzl 
CC: Dan Williams 
CC: Hannes Reinecke 
---
 drivers/scsi/libsas/sas_expander.c | 2 ++
 include/scsi/libsas.h  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/libsas/sas_expander.c
b/drivers/scsi/libsas/sas_expander.c
index 17eb4185f29d..7b0e6dcef6e6 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1904,6 +1904,8 @@ static void sas_unregister_devs_sas_addr(struct
domain_device *parent,
 &parent->port->sas_port_del_list);
 phy->port = NULL;
 }
+if (phy->phy)
+phy->phy->negotiated_linkrate = SAS_LINK_RATE_UNKNOWN;


Does this work for both PHYs which were either directly attached or just
forming part of a wideport?

Please also note that the expander PHY which was previously attached may
also show the incorrect value.


Good catch, all PHYs need to do this, not only the last PHY of the
wideport.


Please also consider this:
- For the expander host PHY, it will also go down. I find however on my 
system that this generates no broadcast event, but the I find that the 
event count has changed for that PHY. So the expander PHY's state would 
also be wrong.  So maybe we need to yet again inject a broadcast.


- we should update PHY sysfs entries for device_type, sas_addr, and 
target protocols







 }

 static int sas_discover_bfs_by_root_level(struct domain_device *root,
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 3de3b10da19a..420156cea3ee 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -448,6 +448,9 @@ static inline void sas_phy_disconnected(struct
asd_sas_phy *phy)
 {
 phy->oob_mode = OOB_NOT_CONNECTED;
 phy->linkrate = SAS_LINK_RATE_UNKNOWN;
+


Then idea behind sas_phy_disconnected() was to set root PHY values only:

/* Before calling a notify event, LLDD should use this function
  * when the link is severed (possibly from its tasklet).
  * The idea is that the Class only reads those, while the LLDD,
  * can R/W these (thus avoiding a race)

libsas does set sas_phy negotiated_linkrate (but only for expander
PHYs), so not the perfect place to set this. I'm nitpicking a bit here.



I don't want to put it here. I asked chenxiang to fix this in LLDD, but
he insist libsas to do this. Would you please discuss with chenxiang and
come to an agreement?


I'm ok with the LLDD.

For sure, adding it here is convenient and would resolve the issue for 
other LLDDs using libsas, but setting it directly in the LLDD seems like 
the right thing to do, since this is what we do for other sas_phy rates.







+if (phy->phy)
+phy->phy->negotiated_linkrate = SAS_LINK_RATE_UNKNOWN;
 }

 static inline unsigned int to_sas_gpio_od(int device, int bit)


Thanks,









.




.






Re: [PATCH v2 6/7] scsi: libsas: reset the phy address if discover failed

2019-01-31 Thread John Garry

On 31/01/2019 02:13, Jason Yan wrote:



On 2019/1/31 1:36, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

When we failed to discover the device, the phy address is still kept


/s/phy/PHY/


in ex_phy. So when the next time we revalidate this phy the
address and device type is the same, it will be considered as flutter
and will not be discovered again. So the device will not be brought up.

Fix this by reset the phy address to the initial value. Then
in the next revalidation the device will be discovered agian.


Why fail to discover the device? I wonder in which scenario you have
seen this, such that it is worth rediscovery.



The test guys have seen this for several times, especially after LLDD
changed the logic of lldd_dev_found and may return error now.


We're finding that a specific SATA disk stays in STP pending for some 
time and we fail the init in lldd dev found, like you say.


However, I think that we should ensure that this passes, as with your 
change I find we get some looping of dev found and lost until it finally 
recovers.


Regardless, I think that your change on its own is ok, so:
Reviewed-by: John Garry 







Tested-by: Chen Liangfei 
Signed-off-by: Jason Yan 
CC: Xiaofei Tan 
CC: John Garry 
CC: Johannes Thumshirn 
CC: Ewan Milne 
CC: Christoph Hellwig 
CC: Tomas Henzl 
CC: Dan Williams 
CC: Hannes Reinecke 
---
 drivers/scsi/libsas/sas_expander.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/scsi/libsas/sas_expander.c
b/drivers/scsi/libsas/sas_expander.c
index 6e56ebdc2148..e781941a7088 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1100,6 +1100,13 @@ static int sas_ex_discover_dev(struct
domain_device *dev, int phy_id)
  i, SAS_ADDR(ex->ex_phy[i].attached_sas_addr));
 }
 }
+} else {
+/* if we failed to discover this device, we have to
+ * reset the expander phy attached address so that we
+ * will not treat the phy as flutter in the next
+ * revalidation
+ */
+memset(ex_phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
 }

 return res;





.




.






RE: [EXT] [PATCH v4 2/3] scsi: ufs: Allow reading descriptor via raw upiu

2019-01-31 Thread Bean Huo (beanhuo)
>
>Signed-off-by: Avri Altman 
>Reviewed-by: Evan Green 
Reviewed-by: Bean Huo 


Re: [PATCH v2 3/7] scsi: libsas: optimize the debug print of the revalidate process

2019-01-31 Thread John Garry

On 31/01/2019 01:31, Jason Yan wrote:



On 2019/1/31 0:41, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

sas_rediscover() returns error code if discover failed for a expander
phy. And sas_ex_revalidate_domain() only returns the last phy's error
code. So when sas_revalidate_domain() prints the return value of the
discover process, we do not know if the revalidation for every phy is
successful or not. We just know the last bcast phy revalidation
succeeded or not.

No need to return a error code for sas_ex_revalidate_domain() and
sas_rediscover(), and just print the debug log for each bcast phy
directly
in sas_rediscover().


do we want to know about every PHY, or just the PHY where res != 0?



Here I mean every PHY that raises bcast.



This may be better added at the sas_rediscover() callsite. And do you 
feel adding this for every bcast phy is useful, or just those whose 
rediscover error'ed?








I don't see any optimisation here. Maybe an improvement.



Thanks, I will change the wording.




Signed-off-by: Jason Yan 
CC: John Garry 
CC: Johannes Thumshirn 
CC: Ewan Milne 
CC: Christoph Hellwig 
CC: Tomas Henzl 
CC: Dan Williams 
CC: Hannes Reinecke 
---
 drivers/scsi/libsas/sas_discover.c |  7 +++
 drivers/scsi/libsas/sas_expander.c | 11 ++-
 include/scsi/libsas.h  |  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c
b/drivers/scsi/libsas/sas_discover.c
index 726ada9b8c79..ffc571a12916 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -500,7 +500,6 @@ static void sas_discover_domain(struct work_struct
*work)

 static void sas_revalidate_domain(struct work_struct *work)
 {
-int res = 0;
 struct sas_discovery_event *ev = to_sas_discovery_event(work);
 struct asd_sas_port *port = ev->port;
 struct sas_ha_struct *ha = port->ha;
@@ -521,10 +520,10 @@ static void sas_revalidate_domain(struct
work_struct *work)

 if (ddev && (ddev->dev_type == SAS_FANOUT_EXPANDER_DEVICE ||
  ddev->dev_type == SAS_EDGE_EXPANDER_DEVICE))
-res = sas_ex_revalidate_domain(ddev);
+sas_ex_revalidate_domain(ddev);

-pr_debug("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
- port->id, task_pid_nr(current), res);
+pr_debug("done REVALIDATING DOMAIN on port %d, pid:%d\n",
+ port->id, task_pid_nr(current));
  out:
 mutex_unlock(&ha->disco_mutex);

diff --git a/drivers/scsi/libsas/sas_expander.c
b/drivers/scsi/libsas/sas_expander.c
index 7b0e6dcef6e6..5cd720f93f96 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2062,7 +2062,7 @@ static int sas_rediscover_dev(struct
domain_device *dev, int phy_id, bool last)
  * first phy,for other phys in this port, we add it to the port to
  * forming the wide-port.
  */
-static int sas_rediscover(struct domain_device *dev, const int phy_id)
+static void sas_rediscover(struct domain_device *dev, const int phy_id)
 {
 struct expander_device *ex = &dev->ex_dev;
 struct ex_phy *changed_phy = &ex->ex_phy[phy_id];
@@ -2090,7 +2090,9 @@ static int sas_rediscover(struct domain_device
*dev, const int phy_id)
 res = sas_rediscover_dev(dev, phy_id, last);
 } else
 res = sas_discover_new(dev, phy_id);
-return res;
+
+pr_debug("ex %016llx phy%d discover returned 0x%x\n",


Hmmm.. this is not just discover, but also rediscover



Yes, will fix.


+ SAS_ADDR(dev->sas_addr), phy_id, res);
 }

 /**
@@ -2102,7 +2104,7 @@ static int sas_rediscover(struct domain_device
*dev, const int phy_id)
  * Discover process only interrogates devices in order to discover the
  * domain.
  */
-int sas_ex_revalidate_domain(struct domain_device *port_dev)
+void sas_ex_revalidate_domain(struct domain_device *port_dev)
 {
 int res;
 struct domain_device *dev = NULL;
@@ -2117,11 +2119,10 @@ int sas_ex_revalidate_domain(struct
domain_device *port_dev)
 res = sas_find_bcast_phy(dev, &phy_id, i, true);


this was missed


Yes, the return value of sas_find_bcast_phy() is actually unused, and
inside the function debug info has been printed. So we can directly
make it a void function.


ok, but how about add a comment like:

-if (phy_id == -1)
+if (phy_id == -1) /* no remaining broadcast phy found */






 if (phy_id == -1)
 break;
-res = sas_rediscover(dev, phy_id);
+sas_rediscover(dev, phy_id);
 i = phy_id + 1;
 } while (i < ex->num_phys);
 }
-return res;
 }

 void sas_smp_handler(struct bsg_job *job, struct Scsi_Host *shost,
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 420156cea3ee..e557bcb0c266 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -692,7 +692,7 @@ int  sas_discover_root_expander(struct
domain_device *);

 void sas_init_ex_attr(void);

-int  sas_ex_revalidate_domain(struct domain_device 

RE: [EXT] [PATCH v4 3/3] scsi: ufs-bsg: Allow reading descriptors

2019-01-31 Thread Bean Huo (beanhuo)
>-Original Message-
>From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
>ow...@vger.kernel.org] On Behalf Of Avri Altman
>Sent: Sunday, January 27, 2019 8:08 AM
>To: James E.J. Bottomley ; Martin K. Petersen
>; linux-scsi@vger.kernel.org; linux-
>ker...@vger.kernel.org; Evan Green 
>Cc: Avi Shchislowski ; Alex Lemberg
>; Avri Altman 
>Subject: [EXT] [PATCH v4 3/3] scsi: ufs-bsg: Allow reading descriptors
>
>Add this functionality, placing the descriptor being read in the actual data 
>buffer
>in the bio.
>
>That is, for both read and write descriptors query upiu, we are using the job's
>request_payload.  This in turn, is mapped back in user land to the applicable
>sg_io_v4 xferp: dout_xferp for write descriptor, and din_xferp for read 
>descriptor.
>
>Signed-off-by: Avri Altman 
Reviewed-by: Bean Huo 


Re: [PATCH v2 4/7] scsi: libsas: split the replacement of sas disks in two steps

2019-01-31 Thread John Garry

On 31/01/2019 02:04, Jason Yan wrote:



On 2019/1/31 1:22, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

Now if a new device replaced a old device, the sas address will change.


Hmmm... not if it's a SATA disk, which would have some same invented SAS
address.



Yes, it's only for a SAS disk.


We unregister the old device and discover the new device in one
revalidation process. But after we deferred the sas_port_delete(), the
sas port is not deleted when we registering the new port and device.
The sas port cannot be added because the name of the new port is the
same as the old.

Fix this by doing the replacement in two steps. The first revalidation
only delete the old device and trigger a new revalidation. The second
revalidation discover the new device. To keep the event processing
synchronised to the original event,


Did I originally suggest this? It seems to needlessly make the code more
complicated.



Yes, my first version was raise a new bcast event, and you said it's not
synchronised to the original event.  Shall I get back to that approach?


Not sure. This patch seems to fix something closely related to that in 
"scsi: libsas: fix issue of swapping two sas disks", which I will check 
further.


EOM




we wrapped a loop and added a new

parameter to see if we should revalidate again.

Signed-off-by: Jason Yan 
CC: chenxiang 
CC: John Garry 
CC: Johannes Thumshirn 
CC: Ewan Milne 
CC: Christoph Hellwig 
CC: Tomas Henzl 
CC: Dan Williams 
CC: Hannes Reinecke 
---
 drivers/scsi/libsas/sas_discover.c | 20 +++-
 drivers/scsi/libsas/sas_expander.c | 20 ++--
 include/scsi/libsas.h  |  2 +-
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c
b/drivers/scsi/libsas/sas_discover.c
index ffc571a12916..c825c89fbddd 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -498,12 +498,10 @@ static void sas_discover_domain(struct
work_struct *work)
  task_pid_nr(current), error);
 }

-static void sas_revalidate_domain(struct work_struct *work)
+static void sas_do_revalidate_domain(struct asd_sas_port *port, bool
*retry)
 {
-struct sas_discovery_event *ev = to_sas_discovery_event(work);
-struct asd_sas_port *port = ev->port;
-struct sas_ha_struct *ha = port->ha;
 struct domain_device *ddev = port->port_dev;
+struct sas_ha_struct *ha = port->ha;

 /* prevent revalidation from finding sata links in recovery */
 mutex_lock(&ha->disco_mutex);
@@ -520,7 +518,7 @@ static void sas_revalidate_domain(struct
work_struct *work)

 if (ddev && (ddev->dev_type == SAS_FANOUT_EXPANDER_DEVICE ||
  ddev->dev_type == SAS_EDGE_EXPANDER_DEVICE))
-sas_ex_revalidate_domain(ddev);
+sas_ex_revalidate_domain(ddev, retry);

 pr_debug("done REVALIDATING DOMAIN on port %d, pid:%d\n",
  port->id, task_pid_nr(current));
@@ -532,6 +530,18 @@ static void sas_revalidate_domain(struct
work_struct *work)
 sas_probe_devices(port);
 }

+static void sas_revalidate_domain(struct work_struct *work)
+{
+struct sas_discovery_event *ev = to_sas_discovery_event(work);
+struct asd_sas_port *port = ev->port;
+bool retry;
+
+do {
+retry = false;
+sas_do_revalidate_domain(port, &retry);
+} while (retry);
+}
+
 /* -- Events -- */

 static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work
*sw)
diff --git a/drivers/scsi/libsas/sas_expander.c
b/drivers/scsi/libsas/sas_expander.c
index 5cd720f93f96..cdbf8d8a28bf 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1994,7 +1994,8 @@ static bool dev_type_flutter(enum
sas_device_type new, enum sas_device_type old)
 return false;
 }

-static int sas_rediscover_dev(struct domain_device *dev, int phy_id,
bool last)
+static int sas_unregister(struct domain_device *dev, int phy_id, bool
last,
+  bool *retry)
 {
 struct expander_device *ex = &dev->ex_dev;
 struct ex_phy *phy = &ex->ex_phy[phy_id];
@@ -2045,7 +2046,12 @@ static int sas_rediscover_dev(struct
domain_device *dev, int phy_id, bool last)
 SAS_ADDR(phy->attached_sas_addr));
 sas_unregister_devs_sas_addr(dev, phy_id, last);

-return sas_discover_new(dev, phy_id);
+/* force the next revalidation find this phy and bring it up */
+phy->phy_change_count = -1;
+ex->ex_change_count = -1;
+*retry = true;


Ohh, sorry to say, but that's a real hack :)



This is the way sas_resume_port() already in use.


Could we just add a flag for the expander PHY to force a discovery
instead of this?



of course we can add a flag instead of this, but I don't think it worth
to do this. We have to change the logic of sas_find_bcast_dev() and
sas_find_bcast_phy() to achieve this. Or we have to add a new function
to find out which PHY's flag is set.


I assume that you need to do this as the expander PHY

Re: [PATCH] scsi: aic94xx: fix module loading

2019-01-31 Thread Emil Velikov
On Thu, 31 Jan 2019 at 00:42, James Bottomley
 wrote:
>
> The aic94xx driver is currently failing to load with errors like
>
> sysfs: cannot create duplicate filename 
> '/devices/pci:00/:00:03.0/:02:00.3/:07:02.0/revision'
>
> Because the PCI code had recently added a file named 'revision' to
> every PCI device.  Fix this by renaming the aic94xx revision file to
> aic_revision.  This is safe to do for us because as far as I can tell,
> there's nothing in userspace relying on the current aic94xx revision
> file so it can be renamed without breaking anything.
>
> Fixes: 702ed3be1b1b (PCI: Create revision file in sysfs)
> Cc: sta...@vger.kernel.org
> Signed-off-by: James Bottomley 
>
Reviewed-by: Emil Velikov 

Thanks for sorting this out James
Emil


Re: [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks

2019-01-31 Thread John Garry

On 31/01/2019 02:55, Jason Yan wrote:



On 2019/1/31 1:53, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

The work flow of revalidation now is scanning expander phy by the
sequence of the phy and check if the phy have changed. This will leads
to an issue of swapping two sas disks on one expander.

Assume we have two sas disks, connected with expander phy10 and phy11:

phy10: 5000cca04eb1001d  port-0:0:10
phy11: 5000cca04eb043ad  port-0:0:11

Swap these two disks, and imaging the following scenario:

revalidation 1:


What does "revalidation 1" actually mean?


'revalidation 1' means one entry in sas_discover_domain().




  -->phy10: 0 --> delete phy10 domain device
  -->phy11: 5000cca04eb043ad (no change)


so is disk 11 still inserted at this stage?


Maybe, but that's what we read from the hardware.




revalidation done

revalidation 2:


is port-0:0:10 deleted now?



Yes. But we don't care about it.


  -->step 1, check phy10:
  -->phy10: 5000cca04eb043ad   --> add to wide port(port-0:0:11) (phy11
   address is still 5000cca04eb043ad now)


We do not want this to happen and it seems to be the crux of the problem.

As an alternate to your solution, how about check if the PHY is an end 
device. If so, it should not form/join a wideport; that is, apart from 
dual-port disks, which I am not sure about - I think each port still has 
a unique WWN, so should be ok.




So this should not happen. How are you physcially swapping them such
that phy11 address is still 5000cca04eb043ad? I don't see how this would
be true at revalidation 1.



This issue is because we always process the PHYs from 0 to max phy
number. And please be aware of the real physcial address of the PHY and
the address stored in the memory is not always the same.
Actually when you checking phy10, phy11 physcial address is not
5000cca04eb043ad. But the address stored in domain device is still
5000cca04eb043ad. We have not get a chance to to read it because we are
processing phy10 now, right?



I see.


It's very easy to reproduce. I suggest you to do it yourself and look at
the logs.



I can't physically access the backpane, and this is not the sort of 
thing which is easy to fake by hacking the driver.


And the log which you provided internally does not have much - if any - 
libsas logs to help me understand it.




  -->step 2, check phy11:
  -->phy11: 0  --> phy11 address is 0 now, but it's part of wide
   port(port-0:0:11), the domain device will not be deleted.
revalidation done

revalidation 3:
  -->phy10, 5000cca04eb043ad (no change)
  -->phy11: 5000cca04eb1001d --> try to add port-0:0:11 but failed,
   port-0:0:11 already exist, trigger a warning as follows
revalidation done

[14790.189699] sysfs: cannot create duplicate filename
'/devices/pci:74/:74:02.0/host0/port-0:0/expander-0:0/port-0:0:11'


[14790.201081] CPU: 25 PID: 5031 Comm: kworker/u192:3 Not tainted
4.16.0-rc1-191134-g138f084-dirty #228
[14790.210199] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI
Nemo 2.0 RC0 - B303 05/16/2018
[14790.219323] Workqueue: :74:02.0_disco_q sas_revalidate_domain
[14790.225404] Call trace:
[14790.227842]  dump_backtrace+0x0/0x18c
[14790.231492]  show_stack+0x14/0x1c
[14790.234798]  dump_stack+0x88/0xac
[14790.238101]  sysfs_warn_dup+0x64/0x7c
[14790.241751]  sysfs_create_dir_ns+0x90/0xa0
[14790.245835]  kobject_add_internal+0xa0/0x284
[14790.250092]  kobject_add+0xb8/0x11c
[14790.253570]  device_add+0xe8/0x598
[14790.256960]  sas_port_add+0x24/0x50
[14790.260436]  sas_ex_discover_devices+0xb10/0xc30
[14790.265040]  sas_ex_revalidate_domain+0x1d8/0x518
[14790.269731]  sas_revalidate_domain+0x12c/0x154
[14790.274163]  process_one_work+0x128/0x2b0
[14790.278160]  worker_thread+0x14c/0x408
[14790.281897]  kthread+0xfc/0x128
[14790.285026]  ret_from_fork+0x10/0x18
[14790.288598] [ cut here ]

At last, the disk 5000cca04eb1001d is lost.




Re: [PATCH v2 4/7] scsi: libsas: split the replacement of sas disks in two steps

2019-01-31 Thread John Garry

On 31/01/2019 10:29, John Garry wrote:

On 31/01/2019 02:04, Jason Yan wrote:



On 2019/1/31 1:22, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

Now if a new device replaced a old device, the sas address will change.


Hmmm... not if it's a SATA disk, which would have some same invented SAS
address.



Yes, it's only for a SAS disk.


We unregister the old device and discover the new device in one
revalidation process. But after we deferred the sas_port_delete(), the
sas port is not deleted when we registering the new port and device.
The sas port cannot be added because the name of the new port is the
same as the old.

Fix this by doing the replacement in two steps. The first revalidation
only delete the old device and trigger a new revalidation. The second
revalidation discover the new device. To keep the event processing
synchronised to the original event,


This change seems ok, but please see below regarding generating the 
bcast events.




Did I originally suggest this? It seems to needlessly make the code more
complicated.



Yes, my first version was raise a new bcast event, and you said it's not
synchronised to the original event.  Shall I get back to that approach?


Not sure. This patch seems to fix something closely related to that in
"scsi: libsas: fix issue of swapping two sas disks", which I will check
further.



An idea:

So, before the libsas changes to generate dynamic events, when libsas 
was processing a particular event type - like a broadcast event - extra 
events generated by the LLDD were discarded by libsas.


The revalidation process attempted to do all revalidation for the domain 
is a single pass, which was ok. This really did not change.


However, in this revalidation pass, we also clear all expander and PHY 
events.


Maybe this is not the right thing to do. Maybe we should just clear a 
single PHY event per pass, since we're processing each broadcast event 
one-by-one.


Today you will notice that if we remove a disk for example, many 
broadcast events are generated, but only the first broadcast event 
actually does any revalidation.


EOM






we wrapped a loop and added a new

parameter to see if we should revalidate again.

Signed-off-by: Jason Yan 
CC: chenxiang 
CC: John Garry 
CC: Johannes Thumshirn 
CC: Ewan Milne 
CC: Christoph Hellwig 
CC: Tomas Henzl 
CC: Dan Williams 
CC: Hannes Reinecke 
---
 drivers/scsi/libsas/sas_discover.c | 20 +++-
 drivers/scsi/libsas/sas_expander.c | 20 ++--
 include/scsi/libsas.h  |  2 +-
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c
b/drivers/scsi/libsas/sas_discover.c
index ffc571a12916..c825c89fbddd 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -498,12 +498,10 @@ static void sas_discover_domain(struct
work_struct *work)
  task_pid_nr(current), error);
 }

-static void sas_revalidate_domain(struct work_struct *work)
+static void sas_do_revalidate_domain(struct asd_sas_port *port, bool
*retry)
 {
-struct sas_discovery_event *ev = to_sas_discovery_event(work);
-struct asd_sas_port *port = ev->port;
-struct sas_ha_struct *ha = port->ha;
 struct domain_device *ddev = port->port_dev;
+struct sas_ha_struct *ha = port->ha;

 /* prevent revalidation from finding sata links in recovery */
 mutex_lock(&ha->disco_mutex);
@@ -520,7 +518,7 @@ static void sas_revalidate_domain(struct
work_struct *work)

 if (ddev && (ddev->dev_type == SAS_FANOUT_EXPANDER_DEVICE ||
  ddev->dev_type == SAS_EDGE_EXPANDER_DEVICE))
-sas_ex_revalidate_domain(ddev);
+sas_ex_revalidate_domain(ddev, retry);

 pr_debug("done REVALIDATING DOMAIN on port %d, pid:%d\n",
  port->id, task_pid_nr(current));
@@ -532,6 +530,18 @@ static void sas_revalidate_domain(struct
work_struct *work)
 sas_probe_devices(port);
 }

+static void sas_revalidate_domain(struct work_struct *work)
+{
+struct sas_discovery_event *ev = to_sas_discovery_event(work);
+struct asd_sas_port *port = ev->port;
+bool retry;
+
+do {
+retry = false;
+sas_do_revalidate_domain(port, &retry);
+} while (retry);
+}
+
 /* -- Events -- */

 static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work
*sw)
diff --git a/drivers/scsi/libsas/sas_expander.c
b/drivers/scsi/libsas/sas_expander.c
index 5cd720f93f96..cdbf8d8a28bf 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1994,7 +1994,8 @@ static bool dev_type_flutter(enum
sas_device_type new, enum sas_device_type old)
 return false;
 }

-static int sas_rediscover_dev(struct domain_device *dev, int phy_id,
bool last)
+static int sas_unregister(struct domain_device *dev, int phy_id, bool
last,
+  bool *retry)
 {
 struct expander_device *ex = &dev->ex_dev;
 struct ex_phy *phy = &ex->ex_phy[phy_id];
@@ -2045

Re: Question on handling managed IRQs when hotplugging CPUs

2019-01-31 Thread John Garry

On 30/01/2019 12:43, Thomas Gleixner wrote:

On Wed, 30 Jan 2019, John Garry wrote:

On 29/01/2019 17:20, Keith Busch wrote:

On Tue, Jan 29, 2019 at 05:12:40PM +, John Garry wrote:

On 29/01/2019 15:44, Keith Busch wrote:


Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
which would reap all outstanding commands before the CPU and IRQ are
taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
Create hctx for each present CPU"). It sounds like we should bring
something like that back, but make more fine grain to the per-cpu
context.



Seems reasonable. But we would need it to deal with drivers where they
only
expose a single queue to BLK MQ, but use many queues internally. I think
megaraid sas does this, for example.

I would also be slightly concerned with commands being issued from the
driver unknown to blk mq, like SCSI TMF.


I don't think either of those descriptions sound like good candidates
for using managed IRQ affinities.


I wouldn't say that this behaviour is obvious to the developer. I can't see
anything in Documentation/PCI/MSI-HOWTO.txt

It also seems that this policy to rely on upper layer to flush+freeze queues
would cause issues if managed IRQs are used by drivers in other subsystems.
Networks controllers may have multiple queues and unsoliciated interrupts.


It's doesn't matter which part is managing flush/freeze of queues as long
as something (either common subsystem code, upper layers or the driver
itself) does it.

So for the megaraid SAS example the BLK MQ layer obviously can't do
anything because it only sees a single request queue. But the driver could,
if the the hardware supports it. tell the device to stop queueing
completions on the completion queue which is associated with a particular
CPU (or set of CPUs) during offline and then wait for the on flight stuff
to be finished. If the hardware does not allow that, then managed
interrupts can't work for it.



A rough audit of current SCSI drivers tells that these set 
PCI_IRQ_AFFINITY in some path but don't set Scsi_host.nr_hw_queues at all:

aacraid, be2iscsi, csiostor, megaraid, mpt3sas

I don't know specific driver details, like changing completion queue.

Thanks,
John


Thanks,

tglx

.






Re: [PATCH v2 1/7] Introduce the bidi_supported flag in the host template structure

2019-01-31 Thread Bart Van Assche
On Wed, 2019-01-30 at 22:06 -0500, Martin K. Petersen wrote:
> > It would help if you could explain why you are so strongly opposed
> > against keeping BIDI support in the kernel.
> 
> My preference is to remove it because there is simply no good use case
> for it.
> 
> The fact that this peculiar command was once defined and has since been
> obsoleted doesn't imply we should go out of our ways to support it.
> There's all sorts of other cruft defined by T10 that nobody uses and
> never will.

Hi Martin,

Removing bidi support will break bidi drivers. Does this mean that the osd
driver should be removed before bidi support is removed from the SCSI core?

Thanks,

Bart.


Re: [PATCH v2 4/7] scsi: libsas: split the replacement of sas disks in two steps

2019-01-31 Thread Jason Yan




On 2019/2/1 0:38, John Garry wrote:

On 31/01/2019 10:29, John Garry wrote:

On 31/01/2019 02:04, Jason Yan wrote:



On 2019/1/31 1:22, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

Now if a new device replaced a old device, the sas address will
change.


Hmmm... not if it's a SATA disk, which would have some same invented
SAS
address.



Yes, it's only for a SAS disk.


We unregister the old device and discover the new device in one
revalidation process. But after we deferred the sas_port_delete(), the
sas port is not deleted when we registering the new port and device.
The sas port cannot be added because the name of the new port is the
same as the old.

Fix this by doing the replacement in two steps. The first revalidation
only delete the old device and trigger a new revalidation. The second
revalidation discover the new device. To keep the event processing
synchronised to the original event,


This change seems ok, but please see below regarding generating the
bcast events.



Did I originally suggest this? It seems to needlessly make the code
more
complicated.



Yes, my first version was raise a new bcast event, and you said it's not
synchronised to the original event.  Shall I get back to that approach?


Not sure. This patch seems to fix something closely related to that in
"scsi: libsas: fix issue of swapping two sas disks", which I will check
further.



An idea:

So, before the libsas changes to generate dynamic events, when libsas
was processing a particular event type - like a broadcast event - extra
events generated by the LLDD were discarded by libsas.

The revalidation process attempted to do all revalidation for the domain
is a single pass, which was ok. This really did not change.

However, in this revalidation pass, we also clear all expander and PHY
events.



Actually we only clean one expander and it's attached PHYs events now.


Maybe this is not the right thing to do. Maybe we should just clear a
single PHY event per pass, since we're processing each broadcast event
one-by-one.



Yes, we can do this. But I don't understand how this will fix the issue?
We have this issue now because we have to probe the sas port and/or 
delete the sas port out side of the disco_mutex. So for a specific PHY, 
we cannot add and delete at the same time inside the disco_mutex.



Today you will notice that if we remove a disk for example, many
broadcast events are generated, but only the first broadcast event
actually does any revalidation.

EOM






Re: [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks

2019-01-31 Thread Jason Yan




On 2019/2/1 0:34, John Garry wrote:

On 31/01/2019 02:55, Jason Yan wrote:



On 2019/1/31 1:53, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

The work flow of revalidation now is scanning expander phy by the
sequence of the phy and check if the phy have changed. This will leads
to an issue of swapping two sas disks on one expander.

Assume we have two sas disks, connected with expander phy10 and phy11:

phy10: 5000cca04eb1001d  port-0:0:10
phy11: 5000cca04eb043ad  port-0:0:11

Swap these two disks, and imaging the following scenario:

revalidation 1:


What does "revalidation 1" actually mean?


'revalidation 1' means one entry in sas_discover_domain().




  -->phy10: 0 --> delete phy10 domain device
  -->phy11: 5000cca04eb043ad (no change)


so is disk 11 still inserted at this stage?


Maybe, but that's what we read from the hardware.




revalidation done

revalidation 2:


is port-0:0:10 deleted now?



Yes. But we don't care about it.


  -->step 1, check phy10:
  -->phy10: 5000cca04eb043ad   --> add to wide port(port-0:0:11) (phy11
   address is still 5000cca04eb043ad now)


We do not want this to happen and it seems to be the crux of the problem.

As an alternate to your solution, how about check if the PHY is an end
device. If so, it should not form/join a wideport; that is, apart from
dual-port disks, which I am not sure about - I think each port still has
a unique WWN, so should be ok.



If the PHY do not join a wideport, then it have to form a wideport of 
it's own. I'm not sure if we can have two ports with the same address 
and do not break anything?




So this should not happen. How are you physcially swapping them such
that phy11 address is still 5000cca04eb043ad? I don't see how this would
be true at revalidation 1.



This issue is because we always process the PHYs from 0 to max phy
number. And please be aware of the real physcial address of the PHY and
the address stored in the memory is not always the same.
Actually when you checking phy10, phy11 physcial address is not
5000cca04eb043ad. But the address stored in domain device is still
5000cca04eb043ad. We have not get a chance to to read it because we are
processing phy10 now, right?



I see.


It's very easy to reproduce. I suggest you to do it yourself and look at
the logs.



I can't physically access the backpane, and this is not the sort of
thing which is easy to fake by hacking the driver.

And the log which you provided internally does not have much - if any -
libsas logs to help me understand it.



  -->step 2, check phy11:
  -->phy11: 0  --> phy11 address is 0 now, but it's part of wide
   port(port-0:0:11), the domain device will not be deleted.
revalidation done

revalidation 3:
  -->phy10, 5000cca04eb043ad (no change)
  -->phy11: 5000cca04eb1001d --> try to add port-0:0:11 but failed,
   port-0:0:11 already exist, trigger a warning as follows
revalidation done

[14790.189699] sysfs: cannot create duplicate filename
'/devices/pci:74/:74:02.0/host0/port-0:0/expander-0:0/port-0:0:11'



[14790.201081] CPU: 25 PID: 5031 Comm: kworker/u192:3 Not tainted
4.16.0-rc1-191134-g138f084-dirty #228
[14790.210199] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC
UEFI
Nemo 2.0 RC0 - B303 05/16/2018
[14790.219323] Workqueue: :74:02.0_disco_q sas_revalidate_domain
[14790.225404] Call trace:
[14790.227842]  dump_backtrace+0x0/0x18c
[14790.231492]  show_stack+0x14/0x1c
[14790.234798]  dump_stack+0x88/0xac
[14790.238101]  sysfs_warn_dup+0x64/0x7c
[14790.241751]  sysfs_create_dir_ns+0x90/0xa0
[14790.245835]  kobject_add_internal+0xa0/0x284
[14790.250092]  kobject_add+0xb8/0x11c
[14790.253570]  device_add+0xe8/0x598
[14790.256960]  sas_port_add+0x24/0x50
[14790.260436]  sas_ex_discover_devices+0xb10/0xc30
[14790.265040]  sas_ex_revalidate_domain+0x1d8/0x518
[14790.269731]  sas_revalidate_domain+0x12c/0x154
[14790.274163]  process_one_work+0x128/0x2b0
[14790.278160]  worker_thread+0x14c/0x408
[14790.281897]  kthread+0xfc/0x128
[14790.285026]  ret_from_fork+0x10/0x18
[14790.288598] [ cut here ]

At last, the disk 5000cca04eb1001d is lost.



.





Re: [PATCH v2 1/7] Introduce the bidi_supported flag in the host template structure

2019-01-31 Thread Martin K. Petersen


Hi Bart,

> Removing bidi support will break bidi drivers. Does this mean that the
> osd driver should be removed before bidi support is removed from the
> SCSI core?

Yes. Christoph posted the patches a while back but I felt it was a bit
too late in the cycle to merge them for 5.0.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] sd: Protect against submitting READ(6) or WRITE(6) with 256 logical blocks

2019-01-31 Thread Martin K. Petersen


Bart,

>> While the WARN_ON here looks helpful shouldn't we instead ensure that
>> sd_setup_rw6_cmnd never gets called with a 0 argument instead of bailing
>> out like this?
>
> Before I posted this patch I searched for code that submits read or
> write requests with length 0 but I haven't found any. do_iter_read()
> and do_iter_write() in fs/read_write.c do not submit any block layer
> requests if tot_len == 0. Are you perhaps aware of kernel code that
> can submit zero-length read or write requests?

At some level I would have liked the sanity check in the top rw prep
command. However, given that a zero transfer length is mostly a
hypothetical scenario I tend to concur with sticking the check in rw6
outside of the hot path.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2 1/7] Introduce the bidi_supported flag in the host template structure

2019-01-31 Thread Christoph Hellwig
On Tue, Jan 29, 2019 at 03:03:18PM -0500, Douglas Gilbert wrote:
> So are you arguing that an oops that you had a hand in generating ***
> should not fixed, and should be used as a further reason to remove bidi
> completely from the kernel? Bebugging?

It is not the reason, but the final nail in the coffin.  As you might
have noticed we've been trying to get rid of it for a while.

> If the NVMe stack does bidi (and quad-di) using a different technique,
> couldn't the SCSI stack's bidi implementation (which is pretty ugly but is
> less ugly after this patchset) be changed to do bidi the same way as NVMe?

NVMe does not do BIDI at all, and I'll use all my (little) klout in
the committee to avoid having bidi support in NVMe.


Re: [PATCH v2 1/7] Introduce the bidi_supported flag in the host template structure

2019-01-31 Thread Christoph Hellwig
On Wed, Jan 30, 2019 at 08:28:14AM -0800, Bart Van Assche wrote:
> It would help if you could explain why you are so strongly opposed against
> keeping BIDI support in the kernel.

It creates a barely tested code path all over the SCSI and block layers,
including a userspace attack vector.  At the same time it sees zero
real life usage - the only people touching it are less than a handful
people just testing it.

And yes, removing it also helps to shrink struct request, which is used
by all block drivers, and removes additional code from the SCSI (not NVMe)
as well as the block layer I/O path.


remove exofs, the T10 OSD code and block/scsi bidi support V4

2019-01-31 Thread Christoph Hellwig
The only real user of the T10 OSD protocol, the pNFS object layout
driver never went to the point of having shipping products, and we
removed it 1.5 years ago.  Exofs is just a simple example without
real life users.

The code has been mostly unmaintained for years and is getting in the
way of block / SCSI changes, and does not even work properly currently,
so I think it's finally time to drop it.

Quote from Boaz:

"As I said then. It is used in Universities for studies and experiments.
Every once in a while. I get an email with questions and reports.

But yes feel free to remove the all thing!!

I guess I can put it up on github. In a public tree.

Just that I will need to forward port it myself, til now you guys
been doing this for me ;-)"

Now the last time this caused a bit of a stir, but still no actual users,
not even for SG_IO passthrough commands.  So here we go again, this time
including removing everything in the scsi and block layer supporting it,
and thus shrinking struct request.


[PATCH 2/8] bsg-lib: handle bidi requests without block layer help

2019-01-31 Thread Christoph Hellwig
We can just stash away the second request in struct bsg_job instead
of using the block layer req->next_rq field, allowing for the eventual
removal of the latter.

Signed-off-by: Christoph Hellwig 
---
 block/bsg-lib.c   | 44 +---
 block/bsg.c   | 68 +++
 drivers/scsi/scsi_transport_sas.c |  1 -
 include/linux/bsg-lib.h   |  4 ++
 4 files changed, 56 insertions(+), 61 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 192129856342..005e2b75d775 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -51,11 +51,40 @@ static int bsg_transport_fill_hdr(struct request *rq, 
struct sg_io_v4 *hdr,
fmode_t mode)
 {
struct bsg_job *job = blk_mq_rq_to_pdu(rq);
+   int ret;
 
job->request_len = hdr->request_len;
job->request = memdup_user(uptr64(hdr->request), hdr->request_len);
+   if (IS_ERR(job->request))
+   return PTR_ERR(job->request);
+
+   if (hdr->dout_xfer_len && hdr->din_xfer_len) {
+   job->bidi_rq = blk_get_request(rq->q, REQ_OP_SCSI_IN, 0);
+   if (IS_ERR(job->bidi_rq)) {
+   ret = PTR_ERR(job->bidi_rq);
+   goto out;
+   }
+
+   ret = blk_rq_map_user(rq->q, job->bidi_rq, NULL,
+   uptr64(hdr->din_xferp), hdr->din_xfer_len,
+   GFP_KERNEL);
+   if (ret)
+   goto out_free_bidi_rq;
+
+   job->bidi_bio = job->bidi_rq->bio;
+   } else {
+   job->bidi_rq = NULL;
+   job->bidi_bio = NULL;
+   }
 
-   return PTR_ERR_OR_ZERO(job->request);
+   return 0;
+
+out_free_bidi_rq:
+   if (job->bidi_rq)
+   blk_put_request(job->bidi_rq);
+out:
+   kfree(job->request);
+   return ret;
 }
 
 static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
@@ -93,7 +122,7 @@ static int bsg_transport_complete_rq(struct request *rq, 
struct sg_io_v4 *hdr)
/* we assume all request payload was transferred, residual == 0 */
hdr->dout_resid = 0;
 
-   if (rq->next_rq) {
+   if (job->bidi_rq) {
unsigned int rsp_len = job->reply_payload.payload_len;
 
if (WARN_ON(job->reply_payload_rcv_len > rsp_len))
@@ -111,6 +140,11 @@ static void bsg_transport_free_rq(struct request *rq)
 {
struct bsg_job *job = blk_mq_rq_to_pdu(rq);
 
+   if (job->bidi_rq) {
+   blk_rq_unmap_user(job->bidi_bio);
+   blk_put_request(job->bidi_rq);
+   }
+
kfree(job->request);
 }
 
@@ -200,7 +234,6 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct 
request *req)
  */
 static bool bsg_prepare_job(struct device *dev, struct request *req)
 {
-   struct request *rsp = req->next_rq;
struct bsg_job *job = blk_mq_rq_to_pdu(req);
int ret;
 
@@ -211,8 +244,8 @@ static bool bsg_prepare_job(struct device *dev, struct 
request *req)
if (ret)
goto failjob_rls_job;
}
-   if (rsp && rsp->bio) {
-   ret = bsg_map_buffer(&job->reply_payload, rsp);
+   if (job->bidi_rq) {
+   ret = bsg_map_buffer(&job->reply_payload, job->bidi_rq);
if (ret)
goto failjob_rls_rqst_payload;
}
@@ -369,7 +402,6 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
const char *name,
}
 
q->queuedata = dev;
-   blk_queue_flag_set(QUEUE_FLAG_BIDI, q);
blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
 
ret = bsg_register_queue(q, dev, name, &bsg_transport_ops);
diff --git a/block/bsg.c b/block/bsg.c
index a799b0ace55c..f306853c6b08 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -74,6 +74,11 @@ static int bsg_scsi_fill_hdr(struct request *rq, struct 
sg_io_v4 *hdr,
 {
struct scsi_request *sreq = scsi_req(rq);
 
+   if (hdr->dout_xfer_len && hdr->din_xfer_len) {
+   pr_warn_once("BIDI support in bsg has been removed.\n");
+   return -EOPNOTSUPP;
+   }
+
sreq->cmd_len = hdr->request_len;
if (sreq->cmd_len > BLK_MAX_CDB) {
sreq->cmd = kzalloc(sreq->cmd_len, GFP_KERNEL);
@@ -114,14 +119,10 @@ static int bsg_scsi_complete_rq(struct request *rq, 
struct sg_io_v4 *hdr)
hdr->response_len = len;
}
 
-   if (rq->next_rq) {
-   hdr->dout_resid = sreq->resid_len;
-   hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
-   } else if (rq_data_dir(rq) == READ) {
+   if (rq_data_dir(rq) == READ)
hdr->din_resid = sreq->resid_len;
-   } else {
+   else
hdr->dout_resid = sreq->resid_len;
-   }
 
return ret;
 }
@@ -140,8 +141,8 @@ static const struct bsg_ops bsg_scsi_ops = {
 
 static int bsg_sg_io(struct request_

[PATCH 8/8] block: remove bidi support

2019-01-31 Thread Christoph Hellwig
Unused now, and another field in struct request bites the dust.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq-debugfs.c | 1 -
 block/blk-mq.c | 3 ---
 include/linux/blkdev.h | 6 --
 3 files changed, 10 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 90d68760af08..ac832547160a 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -115,7 +115,6 @@ static int queue_pm_only_show(void *data, struct seq_file 
*m)
 static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(STOPPED),
QUEUE_FLAG_NAME(DYING),
-   QUEUE_FLAG_NAME(BIDI),
QUEUE_FLAG_NAME(NOMERGES),
QUEUE_FLAG_NAME(SAME_COMP),
QUEUE_FLAG_NAME(FAIL_IO),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 502cbf964a3b..820d131a6893 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -339,7 +339,6 @@ static struct request *blk_mq_rq_ctx_init(struct 
blk_mq_alloc_data *data,
 
rq->end_io = NULL;
rq->end_io_data = NULL;
-   rq->next_rq = NULL;
 
data->ctx->rq_dispatched[op_is_sync(op)]++;
refcount_set(&rq->ref, 1);
@@ -549,8 +548,6 @@ inline void __blk_mq_end_request(struct request *rq, 
blk_status_t error)
rq_qos_done(rq->q, rq);
rq->end_io(rq, error);
} else {
-   if (unlikely(blk_bidi_rq(rq)))
-   blk_mq_free_request(rq->next_rq);
blk_mq_free_request(rq);
}
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fd1450d53f1c..21beb456b97a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -234,9 +234,6 @@ struct request {
 */
rq_end_io_fn *end_io;
void *end_io_data;
-
-   /* for bidi */
-   struct request *next_rq;
 };
 
 static inline bool blk_op_is_scsi(unsigned int op)
@@ -572,7 +569,6 @@ struct request_queue {
 
 #define QUEUE_FLAG_STOPPED 1   /* queue is stopped */
 #define QUEUE_FLAG_DYING   2   /* queue being torn down */
-#define QUEUE_FLAG_BIDI4   /* queue supports bidi requests 
*/
 #define QUEUE_FLAG_NOMERGES 5  /* disable merge attempts */
 #define QUEUE_FLAG_SAME_COMP   6   /* complete on same CPU-group */
 #define QUEUE_FLAG_FAIL_IO 7   /* fake timeout */
@@ -644,8 +640,6 @@ static inline bool blk_account_rq(struct request *rq)
return (rq->rq_flags & RQF_STARTED) && !blk_rq_is_passthrough(rq);
 }
 
-#define blk_bidi_rq(rq)((rq)->next_rq != NULL)
-
 #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist)
 
 #define rq_data_dir(rq)(op_is_write(req_op(rq)) ? WRITE : READ)
-- 
2.20.1



[PATCH 1/8] bsg: refactor bsg_ioctl

2019-01-31 Thread Christoph Hellwig
Move all actual functionality into helpers, just leaving the dispatch
in this function.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Benjamin Block 
Tested-by: Benjamin Block 
Tested-by: Avri Altman 
---
 block/bsg.c | 158 
 1 file changed, 72 insertions(+), 86 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 50e5f8f666f2..a799b0ace55c 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -138,32 +138,35 @@ static const struct bsg_ops bsg_scsi_ops = {
.free_rq= bsg_scsi_free_rq,
 };
 
-static struct request *
-bsg_map_hdr(struct request_queue *q, struct sg_io_v4 *hdr, fmode_t mode)
+static int bsg_sg_io(struct request_queue *q, fmode_t mode, void __user *uarg)
 {
struct request *rq, *next_rq = NULL;
+   struct bio *bio, *bidi_bio = NULL;
+   struct sg_io_v4 hdr;
int ret;
 
-   if (!q->bsg_dev.class_dev)
-   return ERR_PTR(-ENXIO);
+   if (copy_from_user(&hdr, uarg, sizeof(hdr)))
+   return -EFAULT;
 
-   if (hdr->guard != 'Q')
-   return ERR_PTR(-EINVAL);
+   if (!q->bsg_dev.class_dev)
+   return -ENXIO;
 
-   ret = q->bsg_dev.ops->check_proto(hdr);
+   if (hdr.guard != 'Q')
+   return -EINVAL;
+   ret = q->bsg_dev.ops->check_proto(&hdr);
if (ret)
-   return ERR_PTR(ret);
+   return ret;
 
-   rq = blk_get_request(q, hdr->dout_xfer_len ?
+   rq = blk_get_request(q, hdr.dout_xfer_len ?
REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, 0);
if (IS_ERR(rq))
-   return rq;
+   return PTR_ERR(rq);
 
-   ret = q->bsg_dev.ops->fill_hdr(rq, hdr, mode);
+   ret = q->bsg_dev.ops->fill_hdr(rq, &hdr, mode);
if (ret)
goto out;
 
-   rq->timeout = msecs_to_jiffies(hdr->timeout);
+   rq->timeout = msecs_to_jiffies(hdr.timeout);
if (!rq->timeout)
rq->timeout = q->sg_timeout;
if (!rq->timeout)
@@ -171,7 +174,7 @@ bsg_map_hdr(struct request_queue *q, struct sg_io_v4 *hdr, 
fmode_t mode)
if (rq->timeout < BLK_MIN_SG_TIMEOUT)
rq->timeout = BLK_MIN_SG_TIMEOUT;
 
-   if (hdr->dout_xfer_len && hdr->din_xfer_len) {
+   if (hdr.dout_xfer_len && hdr.din_xfer_len) {
if (!test_bit(QUEUE_FLAG_BIDI, &q->queue_flags)) {
ret = -EOPNOTSUPP;
goto out;
@@ -188,42 +191,29 @@ bsg_map_hdr(struct request_queue *q, struct sg_io_v4 
*hdr, fmode_t mode)
}
 
rq->next_rq = next_rq;
-   ret = blk_rq_map_user(q, next_rq, NULL, uptr64(hdr->din_xferp),
-  hdr->din_xfer_len, GFP_KERNEL);
+   ret = blk_rq_map_user(q, next_rq, NULL, uptr64(hdr.din_xferp),
+  hdr.din_xfer_len, GFP_KERNEL);
if (ret)
goto out_free_nextrq;
}
 
-   if (hdr->dout_xfer_len) {
-   ret = blk_rq_map_user(q, rq, NULL, uptr64(hdr->dout_xferp),
-   hdr->dout_xfer_len, GFP_KERNEL);
-   } else if (hdr->din_xfer_len) {
-   ret = blk_rq_map_user(q, rq, NULL, uptr64(hdr->din_xferp),
-   hdr->din_xfer_len, GFP_KERNEL);
+   if (hdr.dout_xfer_len) {
+   ret = blk_rq_map_user(q, rq, NULL, uptr64(hdr.dout_xferp),
+   hdr.dout_xfer_len, GFP_KERNEL);
+   } else if (hdr.din_xfer_len) {
+   ret = blk_rq_map_user(q, rq, NULL, uptr64(hdr.din_xferp),
+   hdr.din_xfer_len, GFP_KERNEL);
}
 
if (ret)
goto out_unmap_nextrq;
-   return rq;
 
-out_unmap_nextrq:
+   bio = rq->bio;
if (rq->next_rq)
-   blk_rq_unmap_user(rq->next_rq->bio);
-out_free_nextrq:
-   if (rq->next_rq)
-   blk_put_request(rq->next_rq);
-out:
-   q->bsg_dev.ops->free_rq(rq);
-   blk_put_request(rq);
-   return ERR_PTR(ret);
-}
-
-static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
-   struct bio *bio, struct bio *bidi_bio)
-{
-   int ret;
+   bidi_bio = rq->next_rq->bio;
 
-   ret = rq->q->bsg_dev.ops->complete_rq(rq, hdr);
+   blk_execute_rq(q, NULL, rq, !(hdr.flags & BSG_FLAG_Q_AT_TAIL));
+   ret = rq->q->bsg_dev.ops->complete_rq(rq, &hdr);
 
if (rq->next_rq) {
blk_rq_unmap_user(bidi_bio);
@@ -233,6 +223,20 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
blk_rq_unmap_user(bio);
rq->q->bsg_dev.ops->free_rq(rq);
blk_put_request(rq);
+
+   if (copy_to_user(uarg, &hdr, sizeof(hdr)))
+   return -EFAULT;
+   return ret;
+
+out_unmap_nextrq:
+   if (rq->next_rq)
+   blk_rq_unmap

[PATCH 5/8] scsi: remove bidirectional command support

2019-01-31 Thread Christoph Hellwig
No real need for bidi support once the OSD code is gone.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/cxgbi/libcxgbi.c  | 13 +++---
 drivers/scsi/iscsi_tcp.c   |  9 +
 drivers/scsi/libiscsi.c| 64 +++---
 drivers/scsi/libiscsi_tcp.c|  8 ++--
 drivers/scsi/scsi_debug.c  | 51 +---
 drivers/scsi/scsi_error.c  |  3 --
 drivers/scsi/scsi_lib.c| 58 ++-
 drivers/scsi/virtio_scsi.c | 14 ++-
 drivers/target/loopback/tcm_loop.c | 15 ---
 drivers/usb/storage/uas.c  | 11 +
 include/scsi/scsi_cmnd.h   | 19 +
 include/scsi/scsi_eh.h |  1 -
 12 files changed, 35 insertions(+), 231 deletions(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 75f876409fb9..4466ae5c9a74 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -1211,7 +1211,7 @@ scmd_get_params(struct scsi_cmnd *sc, struct scatterlist 
**sgl,
unsigned int *sgcnt, unsigned int *dlen,
unsigned int prot)
 {
-   struct scsi_data_buffer *sdb = prot ? scsi_prot(sc) : scsi_out(sc);
+   struct scsi_data_buffer *sdb = prot ? scsi_prot(sc) : &sc->sdb;
 
*sgl = sdb->table.sgl;
*sgcnt = sdb->table.nents;
@@ -1427,8 +1427,7 @@ static void task_release_itt(struct iscsi_task *task, 
itt_t hdr_itt)
log_debug(1 << CXGBI_DBG_DDP,
  "cdev 0x%p, task 0x%p, release tag 0x%x.\n",
  cdev, task, tag);
-   if (sc &&
-   (scsi_bidi_cmnd(sc) || sc->sc_data_direction == DMA_FROM_DEVICE) &&
+   if (sc && sc->sc_data_direction == DMA_FROM_DEVICE &&
cxgbi_ppm_is_ddp_tag(ppm, tag)) {
struct cxgbi_task_data *tdata = iscsi_task_cxgbi_data(task);
struct cxgbi_task_tag_info *ttinfo = &tdata->ttinfo;
@@ -1460,9 +1459,7 @@ static int task_reserve_itt(struct iscsi_task *task, 
itt_t *hdr_itt)
u32 tag = 0;
int err = -EINVAL;
 
-   if (sc &&
-   (scsi_bidi_cmnd(sc) || sc->sc_data_direction == DMA_FROM_DEVICE)
-   ) {
+   if (sc && sc->sc_data_direction == DMA_FROM_DEVICE) {
struct cxgbi_task_data *tdata = iscsi_task_cxgbi_data(task);
struct cxgbi_task_tag_info *ttinfo = &tdata->ttinfo;
 
@@ -1896,7 +1893,7 @@ int cxgbi_conn_alloc_pdu(struct iscsi_task *task, u8 
opcode)
if (SKB_MAX_HEAD(cdev->skb_tx_rsvd) > (512 * MAX_SKB_FRAGS) &&
(opcode == ISCSI_OP_SCSI_DATA_OUT ||
 (opcode == ISCSI_OP_SCSI_CMD &&
- (scsi_bidi_cmnd(sc) || sc->sc_data_direction == DMA_TO_DEVICE
+ sc->sc_data_direction == DMA_TO_DEVICE)))
/* data could goes into skb head */
headroom += min_t(unsigned int,
SKB_MAX_HEAD(cdev->skb_tx_rsvd),
@@ -1971,7 +1968,7 @@ int cxgbi_conn_init_pdu(struct iscsi_task *task, unsigned 
int offset,
return 0;
 
if (task->sc) {
-   struct scsi_data_buffer *sdb = scsi_out(task->sc);
+   struct scsi_data_buffer *sdb = &task->sc->sdb;
struct scatterlist *sg = NULL;
int err;
 
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index cae6368ebb98..ed3debce2819 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -518,7 +518,7 @@ static int iscsi_sw_tcp_pdu_init(struct iscsi_task *task,
if (!task->sc)
iscsi_sw_tcp_send_linear_data_prep(conn, task->data, count);
else {
-   struct scsi_data_buffer *sdb = scsi_out(task->sc);
+   struct scsi_data_buffer *sdb = &task->sc->sdb;
 
err = iscsi_sw_tcp_send_data_prep(conn, sdb->table.sgl,
  sdb->table.nents, offset,
@@ -952,12 +952,6 @@ static umode_t iscsi_sw_tcp_attr_is_visible(int 
param_type, int param)
return 0;
 }
 
-static int iscsi_sw_tcp_slave_alloc(struct scsi_device *sdev)
-{
-   blk_queue_flag_set(QUEUE_FLAG_BIDI, sdev->request_queue);
-   return 0;
-}
-
 static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
 {
struct iscsi_sw_tcp_host *tcp_sw_host = iscsi_host_priv(sdev->host);
@@ -985,7 +979,6 @@ static struct scsi_host_template iscsi_sw_tcp_sht = {
.eh_device_reset_handler= iscsi_eh_device_reset,
.eh_target_reset_handler = iscsi_eh_recover_target,
.dma_boundary   = PAGE_SIZE - 1,
-   .slave_alloc= iscsi_sw_tcp_slave_alloc,
.slave_configure= iscsi_sw_tcp_slave_configure,
.target_alloc   = iscsi_target_alloc,
.proc_name  = "iscsi_tcp",
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index b8d325ce8754..bca3a8636c27 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -228

[PATCH 7/8] block: remove req->special

2019-01-31 Thread Christoph Hellwig
No users left.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c | 1 -
 include/linux/blkdev.h | 2 --
 2 files changed, 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ba37b9e15e9..502cbf964a3b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -331,7 +331,6 @@ static struct request *blk_mq_rq_ctx_init(struct 
blk_mq_alloc_data *data,
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
rq->nr_integrity_segments = 0;
 #endif
-   rq->special = NULL;
/* tag was already set */
rq->extra_len = 0;
WRITE_ONCE(rq->deadline, 0);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 338604dff7d0..fd1450d53f1c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -216,8 +216,6 @@ struct request {
unsigned short write_hint;
unsigned short ioprio;
 
-   void *special;  /* opaque pointer available for LLD use */
-
unsigned int extra_len; /* length of alignment and padding */
 
enum mq_rq_state state;
-- 
2.20.1



[PATCH 4/8] scsi: remove the SCSI OSD library

2019-01-31 Thread Christoph Hellwig
Now that all the users are gone the SCSI OSD library can be removed
as well.

Signed-off-by: Christoph Hellwig 
---
 Documentation/scsi/osd.txt   |  192 ---
 MAINTAINERS  |6 -
 drivers/scsi/Kconfig |2 -
 drivers/scsi/Makefile|1 -
 drivers/scsi/osd/Kbuild  |   20 -
 drivers/scsi/osd/Kconfig |   49 -
 drivers/scsi/osd/osd_debug.h |   30 -
 drivers/scsi/osd/osd_initiator.c | 2076 --
 drivers/scsi/osd/osd_uld.c   |  571 
 include/scsi/osd_initiator.h |  511 
 include/scsi/osd_ore.h   |  201 ---
 11 files changed, 3659 deletions(-)
 delete mode 100644 Documentation/scsi/osd.txt
 delete mode 100644 drivers/scsi/osd/Kbuild
 delete mode 100644 drivers/scsi/osd/Kconfig
 delete mode 100644 drivers/scsi/osd/osd_debug.h
 delete mode 100644 drivers/scsi/osd/osd_initiator.c
 delete mode 100644 drivers/scsi/osd/osd_uld.c
 delete mode 100644 include/scsi/osd_initiator.h
 delete mode 100644 include/scsi/osd_ore.h

diff --git a/Documentation/scsi/osd.txt b/Documentation/scsi/osd.txt
deleted file mode 100644
index 2bc2ab06b0c0..
--- a/Documentation/scsi/osd.txt
+++ /dev/null
@@ -1,192 +0,0 @@
-The OSD Standard
-
-OSD (Object-Based Storage Device) is a T10 SCSI command set that is designed
-to provide efficient operation of input/output logical units that manage the
-allocation, placement, and accessing of variable-size data-storage containers,
-called objects. Objects are intended to contain operating system and 
application
-constructs. Each object has associated attributes attached to it, which are
-integral part of the object and provide metadata about the object. The standard
-defines some common obligatory attributes, but user attributes can be added as
-needed.
-
-See: http://www.t10.org/ftp/t10/drafts/osd2/ for the latest draft for OSD 2
-or search the web for "OSD SCSI"
-
-OSD in the Linux Kernel
-===
-osd-initiator:
-  The main component of OSD in Kernel is the osd-initiator library. Its main
-user is intended to be the pNFS-over-objects layout driver, which uses objects
-as its back-end data storage. Other clients are the other osd parts listed 
below.
-
-osd-uld:
-  This is a SCSI ULD that registers for OSD type devices and provides a testing
-platform, both for the in-kernel initiator as well as connected targets. It
-currently has no useful user-mode API, though it could have if need be.
-
-osd target:
-  There are no current plans for an OSD target implementation in kernel. For 
all
-needs, a user-mode target that is based on the scsi tgt target framework is
-available from Ohio Supercomputer Center (OSC) at:
-http://www.open-osd.org/bin/view/Main/OscOsdProject
-There are several other target implementations. See http://open-osd.org for 
more
-links.
-
-Files and Folders
-=
-This is the complete list of files included in this work:
-include/scsi/
-   osd_initiator.h   Main API for the initiator library
-   osd_types.h   Common OSD types
-   osd_sec.h Security Manager API
-   osd_protocol.hWire definitions of the OSD standard protocol
-   osd_attributes.h  Wire definitions of OSD attributes
-
-drivers/scsi/osd/
-   osd_initiator.c   OSD-Initiator library implementation
-   osd_uld.c The OSD scsi ULD
-   osd_ktest.{h,c}   In-kernel test suite (called by osd_uld)
-   osd_debug.h   Some printk macros
-   Makefile  For both in-tree and out-of-tree compilation
-   Kconfig   Enables inclusion of the different pieces
-   osd_test.cUser-mode application to call the kernel tests
-
-The OSD-Initiator Library
-=
-osd_initiator is a low level implementation of an osd initiator encoder.
-But even though, it should be intuitive and easy to use. Perhaps over time an
-higher lever will form that automates some of the more common recipes.
-
-init/fini:
-- osd_dev_init() associates a scsi_device with an osd_dev structure
-  and initializes some global pools. This should be done once per scsi_device
-  (OSD LUN). The osd_dev structure is needed for calling osd_start_request().
-
-- osd_dev_fini() cleans up before a osd_dev/scsi_device destruction.
-
-OSD commands encoding, execution, and decoding of results:
-
-struct osd_request's is used to iteratively encode an OSD command and carry
-its state throughout execution. Each request goes through these stages:
-
-a. osd_start_request() allocates the request.
-
-b. Any of the osd_req_* methods is used to encode a request of the specified
-   type.
-
-c. osd_req_add_{get,set}_attr_* may be called to add get/set attributes to the
-   CDB. "List" or "Page" mode can be used exclusively. The attribute-list API
-   can be called multiple times on the same request. However, only one
-   attribute-page can be read, as mandated by the OSD standard.
-
-d. osd_finalize_requ

[PATCH 6/8] scsi: stop setting up request->special

2019-01-31 Thread Christoph Hellwig
No more need in a blk-mq world where the scsi command and request
are allocated together.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/qedf/qedf_io.c | 6 --
 drivers/scsi/qedi/qedi_fw.c | 7 ---
 drivers/scsi/scsi_lib.c | 3 ---
 drivers/scsi/sr.c   | 1 -
 4 files changed, 17 deletions(-)

diff --git a/drivers/scsi/qedf/qedf_io.c b/drivers/scsi/qedf/qedf_io.c
index 6bbc38b1b465..6ca583bdde23 100644
--- a/drivers/scsi/qedf/qedf_io.c
+++ b/drivers/scsi/qedf/qedf_io.c
@@ -1128,12 +1128,6 @@ void qedf_scsi_completion(struct qedf_ctx *qedf, struct 
fcoe_cqe *cqe,
return;
}
 
-   if (!sc_cmd->request->special) {
-   QEDF_WARN(&(qedf->dbg_ctx), "request->special is NULL so "
-   "request not valid, sc_cmd=%p.\n", sc_cmd);
-   return;
-   }
-
if (!sc_cmd->request->q) {
QEDF_WARN(&(qedf->dbg_ctx), "request->q is NULL so request "
   "is not valid, sc_cmd=%p.\n", sc_cmd);
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 25d763ae5d5a..e2a995a6e8e7 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -616,13 +616,6 @@ static void qedi_scsi_completion(struct qedi_ctx *qedi,
goto error;
}
 
-   if (!sc_cmd->request->special) {
-   QEDI_WARN(&qedi->dbg_ctx,
- "request->special is NULL so request not valid, 
sc_cmd=%p.\n",
- sc_cmd);
-   goto error;
-   }
-
if (!sc_cmd->request->q) {
QEDI_WARN(&qedi->dbg_ctx,
  "request->q is NULL so request is not valid, 
sc_cmd=%p.\n",
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6fd1d8d83f07..7e256b384a99 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1568,10 +1568,7 @@ static blk_status_t scsi_mq_prep_fn(struct request *req)
 
scsi_init_command(sdev, cmd);
 
-   req->special = cmd;
-
cmd->request = req;
-
cmd->tag = req->tag;
cmd->prot_op = SCSI_PROT_NORMAL;
 
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 38ddbbfe5f3c..039c27c2d7b3 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -394,7 +394,6 @@ static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt)
ret = scsi_init_io(SCpnt);
if (ret != BLK_STS_OK)
goto out;
-   WARN_ON_ONCE(SCpnt != rq->special);
cd = scsi_cd(rq->rq_disk);
 
/* from here on until we're complete, any goto out
-- 
2.20.1