Re: [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks

2019-02-01 Thread John Garry

On 01/02/2019 02:04, Jason Yan wrote:



On 2019/2/1 0:34, John Garry wrote:

On 31/01/2019 02:55, Jason Yan wrote:



On 2019/1/31 1:53, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

The work flow of revalidation now is scanning expander phy by the
sequence of the phy and check if the phy have changed. This will leads
to an issue of swapping two sas disks on one expander.

Assume we have two sas disks, connected with expander phy10 and phy11:

phy10: 5000cca04eb1001d  port-0:0:10
phy11: 5000cca04eb043ad  port-0:0:11

Swap these two disks, and imaging the following scenario:

revalidation 1:


What does "revalidation 1" actually mean?


'revalidation 1' means one entry in sas_discover_domain().




  -->phy10: 0 --> delete phy10 domain device
  -->phy11: 5000cca04eb043ad (no change)


so is disk 11 still inserted at this stage?


Maybe, but that's what we read from the hardware.




revalidation done

revalidation 2:


is port-0:0:10 deleted now?



Yes. But we don't care about it.


  -->step 1, check phy10:
  -->phy10: 5000cca04eb043ad   --> add to wide port(port-0:0:11)
(phy11
   address is still 5000cca04eb043ad now)


We do not want this to happen and it seems to be the crux of the problem.

As an alternate to your solution, how about check if the PHY is an end
device. If so, it should not form/join a wideport; that is, apart from
dual-port disks, which I am not sure about - I think each port still has
a unique WWN, so should be ok.



If the PHY do not join a wideport, then it have to form a wideport of
it's own. I'm not sure if we can have two ports with the same address
and do not break anything?


I'm not sure, but port-0:0:11 should be deleted from step 2, just after 
this step, below.


Thanks,
John





So this should not happen. How are you physcially swapping them such
that phy11 address is still 5000cca04eb043ad? I don't see how this
would
be true at revalidation 1.



This issue is because we always process the PHYs from 0 to max phy
number. And please be aware of the real physcial address of the PHY and
the address stored in the memory is not always the same.
Actually when you checking phy10, phy11 physcial address is not
5000cca04eb043ad. But the address stored in domain device is still
5000cca04eb043ad. We have not get a chance to to read it because we are
processing phy10 now, right?



I see.


It's very easy to reproduce. I suggest you to do it yourself and look at
the logs.



I can't physically access the backpane, and this is not the sort of
thing which is easy to fake by hacking the driver.

And the log which you provided internally does not have much - if any -
libsas logs to help me understand it.



  -->step 2, check phy11:
  -->phy11: 0  --> phy11 address is 0 now, but it's part of wide
   port(port-0:0:11), the domain device will not be deleted.
revalidation done

revalidation 3:
  -->phy10, 5000cca04eb043ad (no change)
  -->phy11: 5000cca04eb1001d --> try to add port-0:0:11 but failed,
   port-0:0:11 already exist, trigger a warning as follows
revalidation done

[14790.189699] sysfs: cannot create duplicate filename
'/devices/pci:74/:74:02.0/host0/port-0:0/expander-0:0/port-0:0:11'




[14790.201081] CPU: 25 PID: 5031 Comm: kworker/u192:3 Not tainted
4.16.0-rc1-191134-g138f084-dirty #228
[14790.210199] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC
UEFI
Nemo 2.0 RC0 - B303 05/16/2018
[14790.219323] Workqueue: :74:02.0_disco_q sas_revalidate_domain
[14790.225404] Call trace:
[14790.227842]  dump_backtrace+0x0/0x18c
[14790.231492]  show_stack+0x14/0x1c
[14790.234798]  dump_stack+0x88/0xac
[14790.238101]  sysfs_warn_dup+0x64/0x7c
[14790.241751]  sysfs_create_dir_ns+0x90/0xa0
[14790.245835]  kobject_add_internal+0xa0/0x284
[14790.250092]  kobject_add+0xb8/0x11c
[14790.253570]  device_add+0xe8/0x598
[14790.256960]  sas_port_add+0x24/0x50
[14790.260436]  sas_ex_discover_devices+0xb10/0xc30
[14790.265040]  sas_ex_revalidate_domain+0x1d8/0x518
[14790.269731]  sas_revalidate_domain+0x12c/0x154
[14790.274163]  process_one_work+0x128/0x2b0
[14790.278160]  worker_thread+0x14c/0x408
[14790.281897]  kthread+0xfc/0x128
[14790.285026]  ret_from_fork+0x10/0x18
[14790.288598] [ cut here ]

At last, the disk 5000cca04eb1001d is lost.



.




.






Re: [PATCH v2 4/7] scsi: libsas: split the replacement of sas disks in two steps

2019-02-01 Thread John Garry

On 01/02/2019 01:58, Jason Yan wrote:



On 2019/2/1 0:38, John Garry wrote:

On 31/01/2019 10:29, John Garry wrote:

On 31/01/2019 02:04, Jason Yan wrote:



On 2019/1/31 1:22, John Garry wrote:

On 30/01/2019 08:24, Jason Yan wrote:

Now if a new device replaced a old device, the sas address will
change.


Hmmm... not if it's a SATA disk, which would have some same invented
SAS
address.



Yes, it's only for a SAS disk.


We unregister the old device and discover the new device in one
revalidation process. But after we deferred the sas_port_delete(),
the
sas port is not deleted when we registering the new port and device.
The sas port cannot be added because the name of the new port is the
same as the old.

Fix this by doing the replacement in two steps. The first
revalidation
only delete the old device and trigger a new revalidation. The second
revalidation discover the new device. To keep the event processing
synchronised to the original event,


This change seems ok, but please see below regarding generating the
bcast events.



Did I originally suggest this? It seems to needlessly make the code
more
complicated.



Yes, my first version was raise a new bcast event, and you said it's
not
synchronised to the original event.  Shall I get back to that approach?


Not sure. This patch seems to fix something closely related to that in
"scsi: libsas: fix issue of swapping two sas disks", which I will check
further.



An idea:

So, before the libsas changes to generate dynamic events, when libsas
was processing a particular event type - like a broadcast event - extra
events generated by the LLDD were discarded by libsas.

The revalidation process attempted to do all revalidation for the domain
is a single pass, which was ok. This really did not change.

However, in this revalidation pass, we also clear all expander and PHY
events.



Actually we only clean one expander and it's attached PHYs events now.


ok, fine, it's just for one expander; but we still do clear that one 
expanders events fully.


However we would have to be careful here to ensure that we don't have a 
situation where we still have PHY events pending but no broadcast events 
to trigger the revalidation and subsequent processing.





Maybe this is not the right thing to do. Maybe we should just clear a
single PHY event per pass, since we're processing each broadcast event
one-by-one.



Yes, we can do this. But I don't understand how this will fix the issue?


It would solve the problem of having to fixup the expanders events = -1, 
which I mentioned was not so nice.


As for fixing the main problem, I was not against the idea of the other 
change in sas_rediscover_dev() to not call sas_discover_new() when the 
SAS address has changed.



We have this issue now because we have to probe the sas port and/or
delete the sas port out side of the disco_mutex. So for a specific PHY,
we cannot add and delete at the same time inside the disco_mutex.


Today you will notice that if we remove a disk for example, many
broadcast events are generated, but only the first broadcast event
actually does any revalidation.

EOM





.






Re: [PATCH] qla2xxx: Add new FC-NVMe enable BIT to enable FC-NVMe feature

2019-02-01 Thread Ewan D. Milne
On Wed, 2019-01-30 at 09:50 -0800, Himanshu Madhani wrote:
> From: Giridhar Malavali 
> 
> This patch adds new BIT detection to enable FC-NVMe feature in
> the driver.
> 
> Signed-off-by: Giridhar Malavali 
> Signed-off-by: Himanshu Madhani 
> ---
> Hi Martin,
> 
> This patch adds additional bit to enable FC-NVMe in the driver. 
> 
> Please apply this patch to 5.1/scsi-queue at your earliest convenience.
> 
> Thanks,
> Himanshu
> ---
>  drivers/scsi/qla2xxx/qla_def.h | 3 +++
>  drivers/scsi/qla2xxx/qla_mbx.c | 4 +++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
> index 05b5c6fa011d..199713f29cbd 100644
> --- a/drivers/scsi/qla2xxx/qla_def.h
> +++ b/drivers/scsi/qla2xxx/qla_def.h
> @@ -3958,6 +3958,9 @@ struct qla_hw_data {
>   uint16_tfw_subminor_version;
>   uint16_tfw_attributes;
>   uint16_tfw_attributes_h;
> +#define FW_ATTR_H_NVME   BIT_10
> +#define FW_ATTR_H_NVME_UPDATED  BIT_14
> +
>   uint16_tfw_attributes_ext[2];
>   uint32_tfw_memory_size;
>   uint32_tfw_transfer_size;
> diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
> index 3181235c3a32..f4adf6baee69 100644
> --- a/drivers/scsi/qla2xxx/qla_mbx.c
> +++ b/drivers/scsi/qla2xxx/qla_mbx.c
> @@ -1109,7 +1109,9 @@ qla2x00_get_fw_version(scsi_qla_host_t *vha)
>* FW supports nvme and driver load parameter requested nvme.
>* BIT 26 of fw_attributes indicates NVMe support.
>*/
> - if ((ha->fw_attributes_h & 0x400) && ql2xnvmeenable) {
> + if ((ha->fw_attributes_h &
> + (FW_ATTR_H_NVME | FW_ATTR_H_NVME_UPDATED)) &&
> + ql2xnvmeenable) {
>   vha->flags.nvme_enabled = 1;
>   ql_log(ql_log_info, vha, 0xd302,
>   "%s: FC-NVMe is Enabled (0x%x)\n",

Reviewed-by: Ewan D. Milne 



[PATCH] Revert "scsi: libfc: Add WARN_ON() when deleting rports"

2019-02-01 Thread Ross Lagerwall
This reverts commit bbc0f8bd88abefb0f27998f40a073634a3a2db89.

It added a warning whose intent was to check whether the rport was still
linked into the peer list. It doesn't work as intended and gives false
positive warnings for two reasons:

1) If the rport is never linked into the peer list it will not be
considered empty since the list_head is never initialized.

2) If the rport is deleted from the peer list using list_del_rcu(), then
the list_head is in an undefined state and it is not considered empty.

Signed-off-by: Ross Lagerwall 
---
 drivers/scsi/libfc/fc_rport.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
index 9192a1d9dec6..dfba4921b265 100644
--- a/drivers/scsi/libfc/fc_rport.c
+++ b/drivers/scsi/libfc/fc_rport.c
@@ -184,7 +184,6 @@ void fc_rport_destroy(struct kref *kref)
struct fc_rport_priv *rdata;
 
rdata = container_of(kref, struct fc_rport_priv, kref);
-   WARN_ON(!list_empty(&rdata->peers));
kfree_rcu(rdata, rcu);
 }
 EXPORT_SYMBOL(fc_rport_destroy);
-- 
2.17.2



Re: remove exofs, the T10 OSD code and block/scsi bidi support V4

2019-02-01 Thread Jens Axboe
On 2/1/19 12:55 AM, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago.  Exofs is just a simple example without
> real life users.
> 
> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, and does not even work properly currently,
> so I think it's finally time to drop it.
> 
> Quote from Boaz:
> 
> "As I said then. It is used in Universities for studies and experiments.
> Every once in a while. I get an email with questions and reports.
> 
> But yes feel free to remove the all thing!!
> 
> I guess I can put it up on github. In a public tree.
> 
> Just that I will need to forward port it myself, til now you guys
> been doing this for me ;-)"
> 
> Now the last time this caused a bit of a stir, but still no actual users,
> not even for SG_IO passthrough commands.  So here we go again, this time
> including removing everything in the scsi and block layer supporting it,
> and thus shrinking struct request.

I'm fine with killing it. You can add my:

Reviewed-by: Jens Axboe 

to the series.

-- 
Jens Axboe



Re: [PATCH 1/4] block: disk_events: introduce event flags

2019-02-01 Thread Martin Wilck
Hannes, all,

On Mon, 2019-01-28 at 14:54 +0100, Martin Wilck wrote:
> On Sat, 2019-01-26 at 11:09 +0100, Hannes Reinecke wrote:
> > On 1/18/19 10:32 PM, Martin Wilck wrote:
> > > Currently, an empty disk->events field tells the block layer not
> > > to
> > > forward
> > > media change events to user space. This was done in commit
> > > 7c88a168da80 ("block:
> > > don't propagate unlisted DISK_EVENTs to userland") in order to
> > > avoid events
> > > from "fringe" drivers to be forwarded to user space. By doing so,
> > > the block
> > > layer lost the information which events were supported by a
> > > particular
> > > block device, and most importantly, whether or not a given device
> > > supports
> > > media change events at all.
> > > 
> > > Prepare for not interpreting the "events" field this way in the
> > > future any
> > > more. This is done by adding two flag bits that can be set to
> > > have
> > > the
> > > device treated like one that has the "events" field set to a non-
> > > zero value
> > > before. This applies only to the sd and sr drivers, which are
> > > changed to
> > > set the new flags.
> > > 
> > > The new flags are DISK_EVENT_FLAG_POLL to enforce polling of the
> > > device for
> > > synchronous events, and DISK_EVENT_FLAG_UEVENT to tell the
> > > blocklayer to
> > > generate udev events from kernel events. They can easily be fit
> > > in
> > > the int
> > > reserved for event bits.
> > > 
> > > This patch doesn't change behavior.
> > > 
> > > Signed-off-by: Martin Wilck 
> > > ---
> > >   block/genhd.c | 22 --
> > >   drivers/scsi/sd.c |  3 ++-
> > >   drivers/scsi/sr.c |  3 ++-
> > >   include/linux/genhd.h |  7 +++
> > >   4 files changed, 27 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/block/genhd.c b/block/genhd.c
> > > index 1dd8fd6..bcd16f6 100644
> > > --- a/block/genhd.c
> > > +++ b/block/genhd.c
> > > @@ -1631,7 +1631,8 @@ static unsigned long
> > > disk_events_poll_jiffies(struct gendisk *disk)
> > >*/
> > >   if (ev->poll_msecs >= 0)
> > >   intv_msecs = ev->poll_msecs;
> > > - else if (disk->events & ~disk->async_events)
> > > + else if (disk->events & DISK_EVENT_FLAG_POLL
> > > +  && disk->events & ~disk->async_events)
> > >   intv_msecs = disk_events_dfl_poll_msecs;
> > >   
> > >   return msecs_to_jiffies(intv_msecs);
> > Hmm. That is an ... odd condition.
> > Clearly it's pointless to have the event bit set in the ->events
> > mask
> > if 
> > it's already part of the ->async_events mask.
> 
> The "events" bit has to be set in that case. "async_events" is
> defined
> as a subset of "events", see genhd.h. You can trivially verify that
> this is currently true, as NO driver that sets any bit in the
> "async_events" field. I was wondering if "async_events" can't be
> ditched completely, but I didn't want to make that aggressive a
> change
> in this patch set.
> 
> > But shouldn't we better _prevent_ this from happening, ie refuse to
> > set
> > DISK_EVENT_FLAG_POLL in events if it's already in ->async_events?
> > Then the above check would be simplified.
> 
> Asynchronous events need not be polled for, therefore setting the
> POLL
> flag in async_events makes no sense. My intention was to use these
> "flag" bits in the "events" field only. Perhaps I should have
> expressed
> that more clearly?
> 
> Anyway, unless I'm really blind, the condition above is actually the
> same as before, just that I now require the POLL flag to be set as
> well, which is the main point of the patch.

Does this response make sense? If yes, can we get the set merged?
If no, what do I need to change? Should I add a patch that does away
with the unused async_events field?

Thanks
Martin


-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




Re: Question on handling managed IRQs when hotplugging CPUs

2019-02-01 Thread Hannes Reinecke

On 1/31/19 6:48 PM, John Garry wrote:

On 30/01/2019 12:43, Thomas Gleixner wrote:

On Wed, 30 Jan 2019, John Garry wrote:

On 29/01/2019 17:20, Keith Busch wrote:

On Tue, Jan 29, 2019 at 05:12:40PM +, John Garry wrote:

On 29/01/2019 15:44, Keith Busch wrote:


Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
which would reap all outstanding commands before the CPU and IRQ are
taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
Create hctx for each present CPU"). It sounds like we should bring
something like that back, but make more fine grain to the per-cpu
context.



Seems reasonable. But we would need it to deal with drivers where they
only
expose a single queue to BLK MQ, but use many queues internally. I 
think

megaraid sas does this, for example.

I would also be slightly concerned with commands being issued from the
driver unknown to blk mq, like SCSI TMF.


I don't think either of those descriptions sound like good candidates
for using managed IRQ affinities.


I wouldn't say that this behaviour is obvious to the developer. I 
can't see

anything in Documentation/PCI/MSI-HOWTO.txt

It also seems that this policy to rely on upper layer to flush+freeze 
queues
would cause issues if managed IRQs are used by drivers in other 
subsystems.
Networks controllers may have multiple queues and unsoliciated 
interrupts.


It's doesn't matter which part is managing flush/freeze of queues as long
as something (either common subsystem code, upper layers or the driver
itself) does it.

So for the megaraid SAS example the BLK MQ layer obviously can't do
anything because it only sees a single request queue. But the driver 
could,

if the the hardware supports it. tell the device to stop queueing
completions on the completion queue which is associated with a particular
CPU (or set of CPUs) during offline and then wait for the on flight stuff
to be finished. If the hardware does not allow that, then managed
interrupts can't work for it.



A rough audit of current SCSI drivers tells that these set 
PCI_IRQ_AFFINITY in some path but don't set Scsi_host.nr_hw_queues at all:

aacraid, be2iscsi, csiostor, megaraid, mpt3sas

Megaraid and mpt3sas don't have that functionality (or, at least, not 
that I'm aware).

And in general I'm not sure if the above approach is feasible.

Thing is, if we have _managed_ CPU hotplug (ie if the hardware provides 
some means of quiescing the CPU before hotplug) then the whole thing is 
trivial; disable SQ and wait for all outstanding commands to complete.

Then trivially all requests are completed and the issue is resolved.
Even with todays infrastructure.

And I'm not sure if we can handle surprise CPU hotplug at all, given all 
the possible race conditions.

But then I might be wrong.

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Stall with RAID10 + XFS + mpt3sas

2019-02-01 Thread Sebastian Andrzej Siewior
Hi,

I have a LSI SAS3008 [0] attached to a few disks. I've setup a md raid10
on on them and created XFS file system on it. While the raid was still
rebuilding I rsynced approx 2TiB of data. This went smooth. The raid
was still rebuilding and I started doing some I/O and after
approximately 5 minutes it stopped doing I/O. Repeatedly.
The backtrace shows:
| md10_resync D0  1797  2 0x8000
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  schedule+0x32/0x80
|  raise_barrier+0xc3/0x190 [raid10]
|  ? remove_wait_queue+0x60/0x60
|  raid10_sync_request+0x989/0x1d50 [raid10]
|  ? is_mddev_idle+0x44/0x12a [md_mod]
|  ? cpumask_next+0x16/0x20
|  ? is_mddev_idle+0xcc/0x12a [md_mod]
|  md_do_sync+0xc44/0x1030 [md_mod]
|  ? remove_wait_queue+0x60/0x60
|  ? __switch_to_asm+0x34/0x70
|  ? md_thread+0x125/0x170 [md_mod]
|  md_thread+0x125/0x170 [md_mod]
|  kthread+0xf8/0x130
|  ? md_rdev_init+0xc0/0xc0 [md_mod]
|  ? kthread_create_worker_on_cpu+0x70/0x70
|  ret_from_fork+0x35/0x40
| xfsaild/md10D0  1841  2 0x8000
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  schedule+0x32/0x80
|  wait_barrier+0x146/0x1a0 [raid10]
|  ? remove_wait_queue+0x60/0x60
|  raid10_write_request+0x74/0x8e0 [raid10]
|  ? mempool_alloc+0x69/0x190
|  ? md_write_start+0xd0/0x210 [md_mod]
|  raid10_make_request+0xbf/0x140 [raid10]
|  md_handle_request+0x116/0x190 [md_mod]
|  md_make_request+0x72/0x170 [md_mod]
|  generic_make_request+0x1e7/0x410
|  ? submit_bio+0x6c/0x140
|  ? xfs_inode_buf_verify+0x84/0x150 [xfs]
|  submit_bio+0x6c/0x140
|  ? bio_add_page+0x48/0x60
|  _xfs_buf_ioapply+0x324/0x4d0 [xfs]
|  ? __kernel_fpu_end+0x30/0x80
|  ? xfs_buf_delwri_submit_buffers+0x17e/0x2c0 [xfs]
|  ? __xfs_buf_submit+0xe2/0x240 [xfs]
|  __xfs_buf_submit+0xe2/0x240 [xfs]
|  xfs_buf_delwri_submit_buffers+0x17e/0x2c0 [xfs]
|  ? xfsaild+0x2dc/0x830 [xfs]
|  ? xfsaild+0x2dc/0x830 [xfs]
|  xfsaild+0x2dc/0x830 [xfs]
|  ? __switch_to_asm+0x34/0x70
|  ? kthread+0xf8/0x130
|  kthread+0xf8/0x130
|  ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
|  ? kthread_create_worker_on_cpu+0x70/0x70
|  ret_from_fork+0x35/0x40
| kworker/36:3D0  2057  2 0x8000
| Workqueue: md submit_flushes [md_mod]
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  schedule+0x32/0x80
|  wait_barrier+0x146/0x1a0 [raid10]
|  ? remove_wait_queue+0x60/0x60
|  raid10_write_request+0x74/0x8e0 [raid10]
|  ? mempool_alloc+0x69/0x190
|  ? md_write_start+0xd0/0x210 [md_mod]
|  ? try_to_wake_up+0x54/0x4a0
|  raid10_make_request+0xbf/0x140 [raid10]
|  md_handle_request+0x116/0x190 [md_mod]
|  md_make_request+0x72/0x170 [md_mod]
|  generic_make_request+0x1e7/0x410
|  ? raid10_write_request+0x660/0x8e0 [raid10]
|  raid10_write_request+0x660/0x8e0 [raid10]
|  ? mempool_alloc+0x69/0x190
|  ? md_write_start+0xd0/0x210 [md_mod]
|  ? __switch_to_asm+0x40/0x70
|  ? __switch_to_asm+0x34/0x70
|  ? __switch_to_asm+0x40/0x70
|  raid10_make_request+0xbf/0x140 [raid10]
|  md_handle_request+0x116/0x190 [md_mod]
|  ? __switch_to_asm+0x40/0x70
|  submit_flushes+0x21/0x40 [md_mod]
|  process_one_work+0x191/0x370
|  worker_thread+0x4f/0x3b0
|  kthread+0xf8/0x130
|  ? rescuer_thread+0x340/0x340
|  ? kthread_create_worker_on_cpu+0x70/0x70
|  ret_from_fork+0x35/0x40
| borgD0  4097   4096 0x
| Call Trace:
|  ? __schedule+0x3f5/0x880
|  ? xlog_bdstrat+0x30/0x60 [xfs]
|  schedule+0x32/0x80
|  __xfs_log_force_lsn+0x155/0x270 [xfs]
|  ? wake_up_q+0x70/0x70
|  ? xfs_file_fsync+0x100/0x230 [xfs]
|  xfs_log_force_lsn+0x91/0x120 [xfs]
|  xfs_file_fsync+0x100/0x230 [xfs]
|  do_fsync+0x38/0x60
|  __x64_sys_fsync+0x10/0x20
|  do_syscall_64+0x55/0x110
|  entry_SYSCALL_64_after_hwframe+0x44/0xa9
| RIP: 0033:0x7fb0bf0b3010
| Code: Bad RIP value.

This was latest v4.19 from two weeks ago. This looks to me like mp3sas
didn't wakeup someone after an I/O completed and everything stopped.
Since this machine runs productive I didn't have much time to debug
this. I replaced XFS with EXT4 and the problem disappeared. I even
restarted the md raid rebuild to have the same testing scenario.
Nothing.
While it looks like a XFS problem I believe that XFS manages to submit
enough requests to confuse mp3sas while EXT4 doesn't.
Is this a known problem?

[0] PCI ID 1000:0097

Sebastian


mpt3sas and T10DIF

2019-02-01 Thread Sebastian Andrzej Siewior
Hi,

I tried to use T10DIF on a SAS disk(s) behind a LSI SAS3008 controller.
So I enabled type 1 via
sg_format --format --fmtpinfo=2 /dev/sdX

on each disk, waited a few hours. After the operation completed the
disks reported that T10DIF is enabled. I was able to read/write from/to
the disk(s). This was on a v4.14 kernel.

Then I upgraded to v4.19 and things started to fall apart. I was able to
read/write the first LBA but everything after that failed:

|[  .343777] sd 0:0:0:0: [sda] tag#2788 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .345709] sd 0:0:0:0: [sda] tag#2788 Sense Key : Illegal Request [current]
|[  .347481] sd 0:0:0:0: [sda] tag#2788 Add. Sense: Logical block reference tag 
check failed
|[  .349244] sd 0:0:0:0: [sda] tag#2788 CDB: Read(10) 28 20 00 00 00 04 00 00 
08 00
|[  .352766] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .354548] sd 0:0:0:0: [sda] tag#2789 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
|[  .356328] sd 0:0:0:0: [sda] tag#2789 Sense Key : Aborted Command [current]
|[  .358105] sd 0:0:0:0: [sda] tag#2789 Add. Sense: Logical block reference tag 
check failed
|[  .359868] sd 0:0:0:0: [sda] tag#2789 CDB: Write(10) 2a 20 00 00 00 01 00 00 
01 00
|
|[  .508308] sd 0:0:0:0: [sda] tag#1180 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .508323] sd 0:0:0:0: [sda] tag#1180 Sense Key : Illegal Request [current]
|[  .508326] sd 0:0:0:0: [sda] tag#1180 Add. Sense: Logical block reference tag 
check failed
|[  .508328] sd 0:0:0:0: [sda] tag#1180 CDB: Read(10) 28 20 00 00 00 02 00 00 
02 00
|[  .508337] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .534943] sd 0:0:0:0: [sda] tag#547 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .534949] sd 0:0:0:0: [sda] tag#547 Sense Key : Illegal Request [current]
|[  .534954] sd 0:0:0:0: [sda] tag#547 Add. Sense: Logical block reference tag 
check failed
|[  .534958] sd 0:0:0:0: [sda] tag#547 CDB: Read(10) 28 20 00 00 00 01 00 00 01 
00
|[  .534983] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .535876] sd 0:0:0:0: [sda] tag#547 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .535879] sd 0:0:0:0: [sda] tag#547 Sense Key : Illegal Request [current]
|[  .535882] sd 0:0:0:0: [sda] tag#547 Add. Sense: Logical block reference tag 
check failed
|[  .535884] sd 0:0:0:0: [sda] tag#547 CDB: Read(10) 28 20 00 00 00 03 00 00 01 
00
|[  .535899] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)

I went back to v4.14 and the errors disappeared. In v4.19 I was able to
tell mpt3sas to disable T10DIF [0] and then I could read from the disk
again.
Does anyone have an idea what happened between v4.14 and v4.19 that
would explain this?
Since this runs productive I went back and disabled T10DIF.

[0] mpt3sas.prot_mask=2

Sebastian


Re: mpt3sas and T10DIF

2019-02-01 Thread John Garry

On 01/02/2019 16:20, Sebastian Andrzej Siewior wrote:

Hi,

I tried to use T10DIF on a SAS disk(s) behind a LSI SAS3008 controller.
So I enabled type 1 via
sg_format --format --fmtpinfo=2 /dev/sdX

on each disk, waited a few hours. After the operation completed the
disks reported that T10DIF is enabled. I was able to read/write from/to
the disk(s). This was on a v4.14 kernel.

Then I upgraded to v4.19 and things started to fall apart. I was able to
read/write the first LBA but everything after that failed:

|[  .343777] sd 0:0:0:0: [sda] tag#2788 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .345709] sd 0:0:0:0: [sda] tag#2788 Sense Key : Illegal Request [current]
|[  .347481] sd 0:0:0:0: [sda] tag#2788 Add. Sense: Logical block reference tag 
check failed
|[  .349244] sd 0:0:0:0: [sda] tag#2788 CDB: Read(10) 28 20 00 00 00 04 00 00 
08 00
|[  .352766] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .354548] sd 0:0:0:0: [sda] tag#2789 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
|[  .356328] sd 0:0:0:0: [sda] tag#2789 Sense Key : Aborted Command [current]
|[  .358105] sd 0:0:0:0: [sda] tag#2789 Add. Sense: Logical block reference tag 
check failed
|[  .359868] sd 0:0:0:0: [sda] tag#2789 CDB: Write(10) 2a 20 00 00 00 01 00 00 
01 00
|
|[  .508308] sd 0:0:0:0: [sda] tag#1180 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .508323] sd 0:0:0:0: [sda] tag#1180 Sense Key : Illegal Request [current]
|[  .508326] sd 0:0:0:0: [sda] tag#1180 Add. Sense: Logical block reference tag 
check failed
|[  .508328] sd 0:0:0:0: [sda] tag#1180 CDB: Read(10) 28 20 00 00 00 02 00 00 
02 00
|[  .508337] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .534943] sd 0:0:0:0: [sda] tag#547 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .534949] sd 0:0:0:0: [sda] tag#547 Sense Key : Illegal Request [current]
|[  .534954] sd 0:0:0:0: [sda] tag#547 Add. Sense: Logical block reference tag 
check failed
|[  .534958] sd 0:0:0:0: [sda] tag#547 CDB: Read(10) 28 20 00 00 00 01 00 00 01 
00
|[  .534983] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .535876] sd 0:0:0:0: [sda] tag#547 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .535879] sd 0:0:0:0: [sda] tag#547 Sense Key : Illegal Request [current]
|[  .535882] sd 0:0:0:0: [sda] tag#547 Add. Sense: Logical block reference tag 
check failed
|[  .535884] sd 0:0:0:0: [sda] tag#547 CDB: Read(10) 28 20 00 00 00 03 00 00 01 
00
|[  .535899] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)

I went back to v4.14 and the errors disappeared. In v4.19 I was able to
tell mpt3sas to disable T10DIF [0] and then I could read from the disk
again.
Does anyone have an idea what happened between v4.14 and v4.19 that
would explain this?


Possibly missing this (not sure which 4.19 you mean):

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/include/linux/t10-pi.h?h=v4.19.13&id=690699b271858d45587c868c5166cb6d495a953f


Since this runs productive I went back and disabled T10DIF.

[0] mpt3sas.prot_mask=2

Sebastian

.






Re: mpt3sas and T10DIF

2019-02-01 Thread Douglas Miller
I saw "ref tag" errors on NVMe devices, starting with 4.18 kernels. The 
fix was:


https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=7809167da5c86fd6bf309b33dee7a797e263342f



On 02/01/2019 10:20 AM, Sebastian Andrzej Siewior wrote:

Hi,

I tried to use T10DIF on a SAS disk(s) behind a LSI SAS3008 controller.
So I enabled type 1 via
sg_format --format --fmtpinfo=2 /dev/sdX

on each disk, waited a few hours. After the operation completed the
disks reported that T10DIF is enabled. I was able to read/write from/to
the disk(s). This was on a v4.14 kernel.

Then I upgraded to v4.19 and things started to fall apart. I was able to
read/write the first LBA but everything after that failed:

|[  .343777] sd 0:0:0:0: [sda] tag#2788 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .345709] sd 0:0:0:0: [sda] tag#2788 Sense Key : Illegal Request [current]
|[  .347481] sd 0:0:0:0: [sda] tag#2788 Add. Sense: Logical block reference tag 
check failed
|[  .349244] sd 0:0:0:0: [sda] tag#2788 CDB: Read(10) 28 20 00 00 00 04 00 00 
08 00
|[  .352766] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .354548] sd 0:0:0:0: [sda] tag#2789 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
|[  .356328] sd 0:0:0:0: [sda] tag#2789 Sense Key : Aborted Command [current]
|[  .358105] sd 0:0:0:0: [sda] tag#2789 Add. Sense: Logical block reference tag 
check failed
|[  .359868] sd 0:0:0:0: [sda] tag#2789 CDB: Write(10) 2a 20 00 00 00 01 00 00 
01 00
|
|[  .508308] sd 0:0:0:0: [sda] tag#1180 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .508323] sd 0:0:0:0: [sda] tag#1180 Sense Key : Illegal Request [current]
|[  .508326] sd 0:0:0:0: [sda] tag#1180 Add. Sense: Logical block reference tag 
check failed
|[  .508328] sd 0:0:0:0: [sda] tag#1180 CDB: Read(10) 28 20 00 00 00 02 00 00 
02 00
|[  .508337] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .534943] sd 0:0:0:0: [sda] tag#547 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .534949] sd 0:0:0:0: [sda] tag#547 Sense Key : Illegal Request [current]
|[  .534954] sd 0:0:0:0: [sda] tag#547 Add. Sense: Logical block reference tag 
check failed
|[  .534958] sd 0:0:0:0: [sda] tag#547 CDB: Read(10) 28 20 00 00 00 01 00 00 01 
00
|[  .534983] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)
|[  .535876] sd 0:0:0:0: [sda] tag#547 FAILED Result: hostbyte=DID_ABORT 
driverbyte=DRIVER_SENSE
|[  .535879] sd 0:0:0:0: [sda] tag#547 Sense Key : Illegal Request [current]
|[  .535882] sd 0:0:0:0: [sda] tag#547 Add. Sense: Logical block reference tag 
check failed
|[  .535884] sd 0:0:0:0: [sda] tag#547 CDB: Read(10) 28 20 00 00 00 03 00 00 01 
00
|[  .535899] mpt3sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), 
sub_code(0x043b)

I went back to v4.14 and the errors disappeared. In v4.19 I was able to
tell mpt3sas to disable T10DIF [0] and then I could read from the disk
again.
Does anyone have an idea what happened between v4.14 and v4.19 that
would explain this?
Since this runs productive I went back and disabled T10DIF.

[0] mpt3sas.prot_mask=2

Sebastian





Re: Recent removal of bsg read/write support

2019-02-01 Thread Douglas Gilbert

Updated reply, see below.

On 2018-09-03 4:34 a.m., Dror Levin wrote:

On Sun, Sep 2, 2018 at 8:55 PM Linus Torvalds
 wrote:


On Sun, Sep 2, 2018 at 4:44 AM Richard Weinberger
 wrote:


CC'ing relevant people. Otherwise your mail might get lost.


Indeed.


Sorry for that.


On Sun, Sep 2, 2018 at 1:37 PM Dror Levin  wrote:


We have an internal tool that uses the bsg read/write interface to
issue SCSI commands as part of a test suite for a storage device.

After recently reading on LWN that this interface is to be removed we
tried porting our code to use sg instead. However, that raises new
issues - mainly getting ENOMEM over iSCSI for unknown reasons.


Is there any chance that you can make more data available?


Sure, I can try.

We use writev() to send up to SG_MAX_QUEUE tasks at a time. Occasionally not
all tasks are written at which point we wait for tasks to return before
sending more, but then writev() fails with ENOMEM and we see this in the syslog:

Sep  1 20:58:14 gdc-qa-io-017 kernel: sd 441:0:0:5: [sg73]
sg_common_write: start_req err=-12

Failing tasks are reads of 128KiB.


This is the block layer running out of resources. The sg driver is a
relatively thin shim and when it gets a "no can do" from the layers
below it, the driver has little option than to return said errno.


I'd rather fix the sg interface (which while also broken garbage, we
can't get rid of) than re-surrect the bsg interface.

That said, the removed bsg code looks a hell of a lot prettier than
the nasty sg interface code does, although it also lacks ansolutely
_any_ kind of security checking.


For us the bsg interface also has several advantages over sg:
1. The device name is its HCTL which is nicer than an arbitrary integer.


Not much the sg driver can do about that. The minor number the sg driver
uses and HCT are all arbitrary integers (with the L coming from the
storage device), but I agree the HCTL is more widely used. The
ioctl(, SG_GET_SCSI_ID) fills a structure which includes HCTL. In
my sg v4 driver rewrite the L (LUN) has been tweaked to additionally
send back the 8 byte T10 LUN representation.

The lsscsi utility will show the relationship between HCTL and sg driver
device name with 'lsscsi -g'. It uses sysfs datamining.


2. write() supports writing more than one sg_io_v4 struct so we don't have
to resort to writev().


In my sg v4 rewrite the sg_io_v4 interface can only be sent through
ioctl(SG_IO) [for sync usage] and ioctl(SG_IOSUBMIT) [for async usage].
So it can't be sent through write(2). SG_IOSUBMIT is new and uses the
_IOWR macro which encodes the expected length into the SG_IOSUBMIT value
and that is the size of sg_io_v4. So you can't send an arbitrary number of
sg_io_v4 objects through that ioctl directly. If need be, that can be
cured with another level of indirection (e.g. with a new flag the data-out
can be interpreted as an array sg_io_v4 objects).


3. Queue size is the device's queue depth and not SG_MAX_QUEUE which is 16.


That limit is gone in the sg v4 driver rewrite.


Because of this we would like to continue using the bsg interface,
even if some changes are required to meet security concerns.


I wonder if we could at least try to unify the bsg/sg code - possibly
by making sg use the prettier bsg code (but definitely have to add all
the security measures).

And dammit, the SCSI people need to get their heads out of their
arses. This whole "stream random commands over read/write" needs to go
the f*ck away.

Could we perhaps extend the SG_IO interace to have an async mode?
Instead of "read/write", have "SG_IOSUBMIT" and "SG_IORECEIVE" and
have the SG_IO ioctl just be a shorthand of "both".


Done.


Just my two cents - having an interface other than read/write won't allow
users to treat this fd as a regular file with epoll() and read(). This is
a major bonus for this interface - an sg/bsg device can be used just like
a socket or pipe in any reactor (we use boost asio for example).


Well poll() certainly works (see sg3_utils beta rev 809 testing/sgs_dd.c and
testing/sgh_dd.c) and I can't see why epoll() won't work. These calls work
against the file descriptor and the sg driver keeps the same context around
sg device file descriptors as it has always done. [And that is the major
design flaw in the bsg driver: it doesn't keep proper file descriptor context.]

It is the security folks who don't like the sg inspired (there in lk 1.0.0
from 1992) write(2)/read(2) asynchronous interface. Also, ideally we need
two streams: one for metadata (e.g. commands and responses (status and sense
data)) and another for user data. Protection information could be a third
stream, between the other two. Jamming that all into one stream is a bit ugly.

References:
  sg v3 driver rewrite, description and downloads:
 http://sg.danny.cz/sg/sg_v40.html
  sg3_utils version 1.45 beta, rev 809, link at the top of this page:
 http://sg.danny.cz/sg

Doug Gilbert




Re: [PATCH v2 6/9] scsi: ufs: qcom: Expose the reset controller for PHY

2019-02-01 Thread Stephen Boyd
Quoting Evan Green (2019-01-23 14:11:34)
> Expose a reset controller that the phy can use to perform its
> initialization in a single callback.
> 
> Also, change the use of the phy functions from ufs-qcom such that
> phy_poweron actually fires up the phy, and phy_poweroff actually
> powers it down.

This looks like two patches. One introduces a reset controller and the
other changes the phy functions somehow. Can this be split up and then
the commit text for the second half to change the phy functions can
explain a little more of the theory behind how it's OK to change the
code flow that way?

> 
> Signed-off-by: Evan Green 
> 
> ---
> Note: This change depends on the remaining changes in this series,
> since UFS PHY reset now needs to be done by the PHY driver.

What does it depend on? Is there a bisection hole introduced here? An by
bisection hole I mean more than the problem that an older DT is used
with this new code.

> 
> Changes in v2:
> - Remove include of reset.h (Stephen)

Usually we include headers in both the C file and the header files that
use structs from header files. It wouldn't hurt to include
reset-controller.h in the ufs-qcom.c file. For one thing, it would help
a grep find reset controller drivers trim based on files ending in .c
instead of having to figure out which C file includes the .h file where
the reset-controller header is included.

> - Fix error print of phy_power_on (Stephen)
> - Comment for reset controller warnings on id != 0 (Stephen)
> - Add static to ufs_qcom_reset_ops (Stephen).
> 
>  drivers/scsi/ufs/Kconfig|   1 +
>  drivers/scsi/ufs/ufs-qcom.c | 111 ++--
>  drivers/scsi/ufs/ufs-qcom.h |   4 ++
>  3 files changed, 72 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
> index 3aeadb14aae1e..277ed6ad71c9b 100644
> --- a/drivers/scsi/ufs/ufs-qcom.c
> +++ b/drivers/scsi/ufs/ufs-qcom.c
> @@ -255,11 +260,6 @@ static int ufs_qcom_power_up_sequence(struct ufs_hba 
> *hba)
> if (is_rate_B)
> phy_set_mode(phy, PHY_MODE_UFS_HS_B);
>  
> -   /* Assert PHY reset and apply PHY calibration values */
> -   ufs_qcom_assert_reset(hba);
> -   /* provide 1ms delay to let the reset pulse propagate */
> -   usleep_range(1000, 1100);
> -
> /* phy initialization - calibrate the phy */
> ret = phy_init(phy);
> if (ret) {
> @@ -268,15 +268,6 @@ static int ufs_qcom_power_up_sequence(struct ufs_hba 
> *hba)
> goto out;
> }
>  
> -   /* De-assert PHY reset and start serdes */
> -   ufs_qcom_deassert_reset(hba);
> -
> -   /*
> -* after reset deassertion, phy will need all ref clocks,
> -* voltage, current to settle down before starting serdes.
> -*/
> -   usleep_range(1000, 1100);
> -
> /* power on phy - start serdes and phy's power and clocks */
> ret = phy_power_on(phy);
> if (ret) {
> @@ -290,7 +281,6 @@ static int ufs_qcom_power_up_sequence(struct ufs_hba *hba)
> return 0;
>  
>  out_disable_phy:
> -   ufs_qcom_assert_reset(hba);
> phy_exit(phy);
>  out:
> return ret;
> @@ -554,21 +544,10 @@ static int ufs_qcom_suspend(struct ufs_hba *hba, enum 
> ufs_pm_op pm_op)
> ufs_qcom_disable_lane_clks(host);
> phy_power_off(phy);
>  
> -   /* Assert PHY soft reset */
> -   ufs_qcom_assert_reset(hba);
> -   goto out;
> -   }
> -
> -   /*
> -* If UniPro link is not active, PHY ref_clk, main PHY analog power
> -* rail and low noise analog power rail for PLL can be switched off.
> -*/
> -   if (!ufs_qcom_is_link_active(hba)) {
> +   } else if (!ufs_qcom_is_link_active(hba)) {
> ufs_qcom_disable_lane_clks(host);
> -   phy_power_off(phy);
> }
>  
> -out:
> return ret;
>  }
>  
> @@ -578,21 +557,26 @@ static int ufs_qcom_resume(struct ufs_hba *hba, enum 
> ufs_pm_op pm_op)
> struct phy *phy = host->generic_phy;
> int err;
>  
> -   err = phy_power_on(phy);
> -   if (err) {
> -   dev_err(hba->dev, "%s: failed enabling regs, err = %d\n",
> -   __func__, err);
> -   goto out;
> -   }
> +   if (ufs_qcom_is_link_off(hba)) {
> +   err = phy_power_on(phy);
> +   if (err) {
> +   dev_err(hba->dev, "%s: failed PHY power on: %d\n",
> +   __func__, err);
> +   return err;
> +   }
>  
> -   err = ufs_qcom_enable_lane_clks(host);
> -   if (err)
> -   goto out;
> +   err = ufs_qcom_enable_lane_clks(host);
> +   if (err)
> +   return err;
>  
> -   hba->is_sys_suspended = false;
> +   } else if (!ufs_qcom_is_link_active(hba)) {
> +   err = ufs_qcom

Re: [PATCH] scsi: aic94xx: fix module loading

2019-02-01 Thread James Bottomley
On Wed, 2019-01-30 at 16:42 -0800, James Bottomley wrote:
> The aic94xx driver is currently failing to load with errors like
> 
> sysfs: cannot create duplicate filename 
> '/devices/pci:00/:00:03.0/:02:00.3/:07:02.0/revision'
> 
> Because the PCI code had recently added a file named 'revision' to
> every PCI device.  Fix this by renaming the aic94xx revision file to
> aic_revision.  This is safe to do for us because as far as I can tell,
> there's nothing in userspace relying on the current aic94xx revision
> file so it can be renamed without breaking anything.
> 
> Fixes: 702ed3be1b1b (PCI: Create revision file in sysfs)
> Cc: sta...@vger.kernel.org
> Signed-off-by: James Bottomley 

I finally managed to test this on actual hardware and it works for me. 
I don't have any aic94xx cards, just a couple of IBM donated machines
that have aic94xx built in on an internal PCI express bus daughter
card, so I had to find them and persuade them to boot modern kernels
before I could actually test anything, hence the delay.

James



Re: mpt3sas and T10DIF

2019-02-01 Thread Sebastian Andrzej Siewior
On 2019-02-01 16:35:32 [+], John Garry wrote:
> Possibly missing this (not sure which 4.19 you mean):

awesome. Thanks to both of you.
That means that I could try it out again in the near future.

Sebastian


[Bug 202425] 3w-9xxx: 3ware 9650SE-2LP RAID controller not working on AMD Ryzen system

2019-02-01 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=202425

Bjorn Helgaas (bhelg...@google.com) changed:

   What|Removed |Added

 CC||bhelg...@google.com
  Component|Other   |PCI
   Assignee|scsi_drivers-other@kernel-b |drivers_...@kernel-bugs.osd
   |ugs.osdl.org|l.org
Product|SCSI Drivers|Drivers
 Regression|No  |Yes

--- Comment #4 from Bjorn Helgaas (bhelg...@google.com) ---
Robert, would you mind attaching the complete dmesg and "sudo lspci -vvv"
output, please?  Sorry for the inconvenience, and thanks very much for doing
all the work of a bisection.  I marked this as a regression and am trying to
move it to the Drivers/PCI category (bugzilla isn't completely cooperating).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


Re: Question on handling managed IRQs when hotplugging CPUs

2019-02-01 Thread Thomas Gleixner
On Fri, 1 Feb 2019, Hannes Reinecke wrote:
> Thing is, if we have _managed_ CPU hotplug (ie if the hardware provides some
> means of quiescing the CPU before hotplug) then the whole thing is trivial;
> disable SQ and wait for all outstanding commands to complete.
> Then trivially all requests are completed and the issue is resolved.
> Even with todays infrastructure.
> 
> And I'm not sure if we can handle surprise CPU hotplug at all, given all the
> possible race conditions.
> But then I might be wrong.

The kernel would completely fall apart when a CPU would vanish by surprise,
i.e. uncontrolled by the kernel. Then the SCSI driver exploding would be
the least of our problems.

Thanks,

tglx


Re: [PATCH] qla2xxx: Add new FC-NVMe enable BIT to enable FC-NVMe feature

2019-02-01 Thread Martin K. Petersen


Himanshu,

> This patch adds new BIT detection to enable FC-NVMe feature in
> the driver.
>
> Signed-off-by: Giridhar Malavali 
 ^

Fixed Giridhar's address and applied to 5.1/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH -next] scsi: csiostor: Remove set but not used variable 'pln'

2019-02-01 Thread Martin K. Petersen


YueHaibing,

> Fixes gcc '-Wunused-but-set-variable' warning:
>
> drivers/scsi/csiostor/csio_attr.c: In function 'csio_fcoe_free_vnp':
> drivers/scsi/csiostor/csio_attr.c:500:21: warning:
>  variable 'pln' set but not used [-Wunused-but-set-variable]

Applied to 5.1/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[GIT PULL] SCSI fixes for 5.0-rc4

2019-02-01 Thread James Bottomley
Five minor bug fixes.  The libfc one is a tiny memory leak, the zfcp
one is an incorrect user visible parameter and the rest are on error
legs or obscure features.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Dan Carpenter (2):
  scsi: 53c700: pass correct "dev" to dma_alloc_attrs()
  scsi: bnx2fc: Fix error handling in probe()

Douglas Gilbert (1):
  scsi: scsi_debug: fix write_same with virtual_gb problem

Ming Lu (1):
  scsi: libfc: free skb when receiving invalid flogi resp

Steffen Maier (1):
  scsi: zfcp: fix sysfs block queue limit output for max_segment_size

And the diffstat

 drivers/s390/scsi/zfcp_aux.c|  1 -
 drivers/s390/scsi/zfcp_scsi.c   |  2 ++
 drivers/scsi/53c700.c   |  2 +-
 drivers/scsi/bnx2fc/bnx2fc_io.c |  4 ++--
 drivers/scsi/libfc/fc_lport.c   |  6 +++---
 drivers/scsi/scsi_debug.c   | 41 +
 6 files changed, 29 insertions(+), 27 deletions(-)

With full diff below.

James

---

diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
index 9cf30d124b9e..e390f8c6d5f3 100644
--- a/drivers/s390/scsi/zfcp_aux.c
+++ b/drivers/s390/scsi/zfcp_aux.c
@@ -403,7 +403,6 @@ struct zfcp_adapter *zfcp_adapter_enqueue(struct ccw_device 
*ccw_device)
goto failed;
 
/* report size limit per scatter-gather segment */
-   adapter->dma_parms.max_segment_size = ZFCP_QDIO_SBALE_LEN;
adapter->ccw_device->dev.dma_parms = &adapter->dma_parms;
 
adapter->stat_read_buf_num = FSF_STATUS_READS_RECOM;
diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 00acc7144bbc..f4f6a07c5222 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -428,6 +428,8 @@ static struct scsi_host_template zfcp_scsi_host_template = {
.max_sectors = (((QDIO_MAX_ELEMENTS_PER_BUFFER - 1)
 * ZFCP_QDIO_MAX_SBALS_PER_REQ) - 2) * 8,
   /* GCD, adjusted later */
+   /* report size limit per scatter-gather segment */
+   .max_segment_size= ZFCP_QDIO_SBALE_LEN,
.dma_boundary= ZFCP_QDIO_SBALE_LEN - 1,
.shost_attrs = zfcp_sysfs_shost_attrs,
.sdev_attrs  = zfcp_sysfs_sdev_attrs,
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 128d658d472a..16957d7ac414 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -295,7 +295,7 @@ NCR_700_detect(struct scsi_host_template *tpnt,
if(tpnt->sdev_attrs == NULL)
tpnt->sdev_attrs = NCR_700_dev_attrs;
 
-   memory = dma_alloc_attrs(hostdata->dev, TOTAL_MEM_SIZE, &pScript,
+   memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
 GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
if(memory == NULL) {
printk(KERN_ERR "53c700: Failed to allocate memory for driver, 
detaching\n");
diff --git a/drivers/scsi/bnx2fc/bnx2fc_io.c b/drivers/scsi/bnx2fc/bnx2fc_io.c
index 350257c13a5b..bc9f2a2365f4 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_io.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_io.c
@@ -240,6 +240,7 @@ struct bnx2fc_cmd_mgr *bnx2fc_cmd_mgr_alloc(struct 
bnx2fc_hba *hba)
return NULL;
}
 
+   cmgr->hba = hba;
cmgr->free_list = kcalloc(arr_sz, sizeof(*cmgr->free_list),
  GFP_KERNEL);
if (!cmgr->free_list) {
@@ -256,7 +257,6 @@ struct bnx2fc_cmd_mgr *bnx2fc_cmd_mgr_alloc(struct 
bnx2fc_hba *hba)
goto mem_err;
}
 
-   cmgr->hba = hba;
cmgr->cmds = (struct bnx2fc_cmd **)(cmgr + 1);
 
for (i = 0; i < arr_sz; i++)  {
@@ -295,7 +295,7 @@ struct bnx2fc_cmd_mgr *bnx2fc_cmd_mgr_alloc(struct 
bnx2fc_hba *hba)
 
/* Allocate pool of io_bdts - one for each bnx2fc_cmd */
mem_size = num_ios * sizeof(struct io_bdt *);
-   cmgr->io_bdt_pool = kmalloc(mem_size, GFP_KERNEL);
+   cmgr->io_bdt_pool = kzalloc(mem_size, GFP_KERNEL);
if (!cmgr->io_bdt_pool) {
printk(KERN_ERR PFX "failed to alloc io_bdt_pool\n");
goto mem_err;
diff --git a/drivers/scsi/libfc/fc_lport.c b/drivers/scsi/libfc/fc_lport.c
index be83590ed955..ff943f477d6f 100644
--- a/drivers/scsi/libfc/fc_lport.c
+++ b/drivers/scsi/libfc/fc_lport.c
@@ -1726,14 +1726,14 @@ void fc_lport_flogi_resp(struct fc_seq *sp, struct 
fc_frame *fp,
fc_frame_payload_op(fp) != ELS_LS_ACC) {
FC_LPORT_DBG(lport, "FLOGI not accepted or bad response\n");
fc_lport_error(lport, fp);
-   goto err;
+   goto out;
}
 
flp = fc_frame_payload_get(fp, sizeof(*flp));
if (!flp) {
FC_LPORT_DBG(lport, "FLOGI bad response\n");
fc_lport_error(lport, fp);
-   goto err;
+

Re: [PATCH] scsi: aic94xx: fix module loading

2019-02-01 Thread Martin K. Petersen


James,

> The aic94xx driver is currently failing to load with errors like
>
> sysfs: cannot create duplicate filename 
> '/devices/pci:00/:00:03.0/:02:00.3/:07:02.0/revision'
>
> Because the PCI code had recently added a file named 'revision' to
> every PCI device.  Fix this by renaming the aic94xx revision file to
> aic_revision.  This is safe to do for us because as far as I can tell,
> there's nothing in userspace relying on the current aic94xx revision
> file so it can be renamed without breaking anything.

Applied to 5.0/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH] tcmu: wait for nl reply only if there are listeners

2019-02-01 Thread Mike Christie
genlmsg_multicast_allns used to return -ESRCH even if the message was
successfully sent to a listener. With commit:

commit cb9f7a9a5c96a773bbc9c70660dc600cfff82f82
Author: Nicolas Dichtel 
Date:   Tue Feb 6 14:48:32 2018 +0100

netlink: ensure to loop over all netns in genlmsg_multicast_allns()

it now will return success if the message was sent to a listener. With
that patch, tcmu can now immediately fail if -ESRCH is returned because
we know there will be no reply.

Signed-off-by: Mike Christie 

---
 drivers/target/target_core_user.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 5831e0e..dccc13c 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -1794,9 +1794,6 @@ static int tcmu_netlink_event_send(struct tcmu_dev *udev,
 
ret = genlmsg_multicast_allns(&tcmu_genl_family, skb, 0,
  TCMU_MCGRP_CONFIG, GFP_KERNEL);
-   /* We don't care if no one is listening */
-   if (ret == -ESRCH)
-   ret = 0;
if (!ret)
ret = tcmu_wait_genl_cmd_reply(udev);
return ret;
-- 
1.8.3.1