On Thu, 6 Apr 2017, 1:49am, Hannes Reinecke wrote:
> On 04/06/2017 08:27 AM, Arun Easi wrote:
> > Hi Hannes,
> >
> > Thanks for taking a crack at the issue. My comments below..
> >
> > On Tue, 4 Apr 2017, 5:07am, Hannes Reinecke wrote:
> >
> >> Most legacy HBAs have a tagset per HBA, not per qu
> From: Guilherme G. Piccoli [mailto:gpicc...@linux.vnet.ibm.com]
> Sent: Thursday, April 06, 2017 3:12 PM
> To: dl-esc-Aacraid Linux Driver
> Cc: gpicc...@linux.vnet.ibm.com; linux-scsi@vger.kernel.org; Raghava Aditya
> Renukunta
> Subject: [PATCH] scsi: aacraid: fix PCI error recovery path
>
>
On Thu, 6 Apr 2017, 4:19am, Colin King wrote:
> From: Colin Ian King
>
> There are several local or function parameter pointers that are
> being assigned NULL after a kfree where and these have no effect
> and hence can be removed.
>
> Fixes various cppcheck warnings:
>
> "Assignment of func
During a PCI error recovery, if aac_check_health() is not aware that
a PCI error happened and we have an offline PCI channel, it might
trigger some errors (like NULL pointer dereference) and inhibit the
error recovery process to complete.
This patch makes the health check procedure aware of PCI ch
> On Apr 6, 2017, at 11:39 AM, Christoph Hellwig wrote:
>
> Add a nbd-specific field instead.
>
> Signed-off-by: Christoph Hellwig
This is fine with me, you can add,
Reviewed-by: Josef Bacik
Thanks,
Josef
On Thu, Apr 06, 2017 at 05:39:19PM +0200, Christoph Hellwig wrote:
> Currently the request structure has an errors field that is used in
> various different ways. The oldest drivers use it as an error count,
> blk-mq and the generic timeout code assume that it holds a Linux
> errno for block compl
Martin,
I'm rather surprised nobody else has previously reported this as well,
especially as NetApp hadn't received any reports. The only probably
explanation I could think of is that EL 7 is still based on a 3.10
kernel so is too old to be affected, and that is likely to be what
most NetApp cust
When calling min_not_zero, both arguments should have the same type.
Otherwise the compiler will raise a warning:
CC drivers/scsi/sd.o
In file included from ./include/linux/list.h:8:0,
from ./include/linux/module.h:9,
from drivers/scsi/sd.c:35:
drivers/scsi
On Thu, Apr 06, 2017 at 05:39:26PM +0200, Christoph Hellwig wrote:
> Remove passing req->errors (which at that point is always 0) to
> blk_mq_complete_requestq, and rely on the virtio status code for the
blk_mq_complete_request ^
> serial number passthrough request.
>
> Signed-off-by: Christoph H
On Thu, Apr 06, 2017 at 05:39:25PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig
> ---
Fair enough,
Reviewed-by: Johannes Thumshirn
--
Johannes Thumshirn Storage
jthumsh...@suse.de+49 911 74053 689
SU
On Thu, Apr 06, 2017 at 05:39:22PM +0200, Christoph Hellwig wrote:
> nvme_complete_async_event expects the little endian status code
> including the phase bit, and a new completion handler I plan to
> introduce will do so as well.
>
> Change the status variable into the little endian format with t
On Thu, Apr 06, 2017 at 05:39:23PM +0200, Christoph Hellwig wrote:
> We want our own clearly defined error field for NVMe passthrough commands,
> and the request errors field is going away in its current form.
>
> Just store the status and result field in the nvme_request field from
> hardirq comp
On Thu, Apr 06, 2017 at 05:39:21PM +0200, Christoph Hellwig wrote:
> The function only returns -EIO if rq->errors is non-zero, which is not
> very useful and lets a large number of callers ignore the return value.
>
> Just let the callers figure out their error themselves.
>
> Signed-off-by: Chri
I really would have liked a stable-tag and -send for patches 1 and 2
(both Medium Access Timeout fixes). I think I or at least Steffen also
asked for that.
We saw several real-life occurrences with this bug and I think that
would have qualified for stable just fine.
David Buckley writes:
David,
> As I mentioned previously, I'm fairly certain that the issue I'm
> seeing is due to the fact that while NetApp LUNs are presented as 512B
> logical/4K physical disks for compatibility, they actually don't
> support requests smaller than 4K (which makes sense as Net
Nicholas Mc Guire writes:
> The redundant init_completion() here seems to be a cut&past error as
> struct scsi_qla_host only has 4 completion elements to initialize,
> thus the duplicate init_completion(disable_acb_comp) is simply
> removed.
Applied to 4.12/scsi-queue.
--
Martin K. Petersen
Hannes Reinecke writes:
Hannes,
> this is a resend of a small patchset for cleaning up SCSI EH. Primary
> goal is to make asynchronous aborts mandatory; there hasn't been a
> single report so far where asynchronous abort won't work, so the
> 'no_async_abort' flag has never been used and will be
Martin Wilck writes:
Martin,
> I noticed that the following commits
>
> eb94588dabec scsi: hpsa: fix volume offline state
> 2ef288498087 scsi: hpsa: do not timeout reset operations
> 87b9e6aa87d9 scsi: hpsa: limit outstanding rescans
> 85b29008d8af scsi: hpsa: update check for logical volume sta
Christoph Hellwig writes:
> Ok, the version below simplify skip the function split entirely:
Applied to 4.12/scsi-queue.
--
Martin K. Petersen Oracle Linux Engineering
Bart Van Assche writes:
>> We previously made sure that the reported disk capacity was less than
>> 0x blocks when the kernel was not compiled with large sector_t
>> support (CONFIG_LBDAF). However, this check assumed that the capacity
>> was reported in units of 512 bytes.
>>
>> Add a s
Mauricio,
> The commit 08024885a2a3 ("ses: Add power_status to SES device slot")
> introduced the 'power_status' attribute to enclosure components and
> the associated callbacks.
Applied to 4.12/scsi-queue, thanks!
--
Martin K. Petersen Oracle Linux Engineering
Bart Van Assche writes:
> Now that all scsi_device_get() callers check the return value of this
> function, make checking that return value mandatory.
Applied to 4.12/scsi-queue.
--
Martin K. Petersen Oracle Linux Engineering
Bart Van Assche writes:
Bart,
> scsi_device_get() can fail. Hence check its return value.
Applied to 4.12/scsi-queue.
--
Martin K. Petersen Oracle Linux Engineering
On Thu, Apr 06, 2017 at 08:33:38AM +0300, Sagi Grimberg wrote:
>
> >>Note that the nvme completion queues are still on the host memory, so
> >>this means we have lost the ordering between data and completions as
> >>they go to different pcie targets.
> >
> >Hmm, in this simple up/down case with a
On 05/04/17 11:33 PM, Sagi Grimberg wrote:
>
>>> Note that the nvme completion queues are still on the host memory, so
>>> this means we have lost the ordering between data and completions as
>>> they go to different pcie targets.
>>
>> Hmm, in this simple up/down case with a switch, I think it
Hey Sagi,
On 05/04/17 11:47 PM, Sagi Grimberg wrote:
> Because the user can get it wrong, and its our job to do what we can in
> order to prevent the user from screwing itself.
Well, "screwing" themselves seems a bit strong. It wouldn't be much
different from a lot of other tunables in the system
Signed-off-by: Christoph Hellwig
---
block/blk-core.c | 14 +-
block/blk-exec.c | 3 +--
block/blk-mq.c | 10 +++---
block/blk-timeout.c | 1 -
include/linux/blkdev.h | 2 --
include/trace/events/block.h | 17 +++-
The driver never sets req->errors
---
drivers/block/paride/pd.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c
index 82c6d02193ae..3b0ab214fe74 100644
--- a/drivers/block/paride/pd.c
+++ b/drivers/block/paride/pd.c
@@ -7
Merge blk_mq_ipi_complete_request and blk_mq_stat_add into their only
caller.
Signed-off-by: Christoph Hellwig
---
block/blk-mq.c | 21 ++---
1 file changed, 6 insertions(+), 15 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 393f350ebb90..a6e14a3c87ce 100644
---
Signed-off-by: Christoph Hellwig
---
drivers/block/swim3.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/block/swim3.c b/drivers/block/swim3.c
index 61b3ffa4f458..ba4809c9bdba 100644
--- a/drivers/block/swim3.c
+++ b/drivers/block/swim3.c
@@ -343,8 +343,8 @@ stat
Signed-off-by: Christoph Hellwig
---
include/trace/events/block.h | 44 ++--
kernel/trace/blktrace.c | 9 -
2 files changed, 10 insertions(+), 43 deletions(-)
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index a88e
Signed-off-by: Christoph Hellwig
---
drivers/block/ataflop.c | 12 +++-
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index 2104b1b4ccda..fa69ecd52cb5 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -6
Now that we always have a ->complete callback we can remove the direct
call to blk_mq_end_request, as well as the error argument to
blk_mq_complete_request.
Signed-off-by: Christoph Hellwig
---
block/blk-mq.c| 14 +++---
drivers/block/loop.c | 4 ++--
dr
xen-blkfron is the last users using rq->errros for passing back error to
blk-mq, and I'd like to get rid of that. In the longer run the driver
should be moving more of the completion processing into .complete, but
this is the minimal change to move forward for now.
Signed-off-by: Christoph Hellwi
Instead of using req->errors, which will go away.
Signed-off-by: Christoph Hellwig
---
drivers/block/mtip32xx/mtip32xx.c | 16 +---
drivers/block/mtip32xx/mtip32xx.h | 1 +
2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/block/mtip32xx/mtip32xx.c
b/drivers/b
Add a nbd-specific field instead.
Signed-off-by: Christoph Hellwig
---
drivers/block/nbd.c | 28 ++--
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 03ae72985c79..4f045fab9659 100644
--- a/drivers/block/nbd.
Signed-off-by: Christoph Hellwig
---
drivers/block/floppy.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index ce102ec47ef2..60d4c7653178 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -2805,8 +2805,
This is for the legacy floppy and ataflop drivers that currently abuse
->errors for this purpose. It's stashed away in a union to not grow
the struct size, the other fields are either used by modern drivers
for different purposes or the I/O scheduler before queing the I/O
to drivers.
Signed-off-b
dm never uses rq->errors, so there is no need to pass an error argument
to blk_mq_complete_request.
Signed-off-by: Christoph Hellwig
---
drivers/md/dm-rq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 6886bf160fb2..e1cea42ec2a6
We'll get all proper errors reported through ->end_io and ->errors will
go away soon.
Signed-off-by: Christoph Hellwig
---
drivers/md/dm-mpath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7f223dbed49f..4aa1b865e66e 1006
In thruth I've just audited which blk-mq drivers don't currently have a
complete callback, but I think this change is at least borderline useful.
Signed-off-by: Christoph Hellwig
---
drivers/block/loop.c | 30 ++
drivers/block/loop.h | 1 +
2 files changed, 15 insert
This passes on the scsi_cmnd result field to users of passthrough
requests. Currently we abuse req->errors for this purpose, but that
field will go away in its current form.
Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it. I
Currently error is always 0 for non-passthrough requests when reaching the
scsi_noretry_cmd check in scsi_io_completion, which effectively disables
all fastfail logic. Fix this by having a single call to
__scsi_error_from_host_byte at the beginning of the function and always
having a valid error v
Signed-off-by: Christoph Hellwig
---
drivers/block/null_blk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index f93906ff31e8..24ca85a70fd8 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -281,7 +28
Currently it's used by the lighnvm passthrough ioctl, but we'd like to make
it private in preparation of block layer specific error code. Lighnvm already
returns the real NVMe status anyway, so I think we can just limit it to
returning -EIO for any status set.
This will need a careful audit from
nvme_complete_async_event expects the little endian status code
including the phase bit, and a new completion handler I plan to
introduce will do so as well.
Change the status variable into the little endian format with the
phase bit used in the NVMe CQE to fix / enable this.
Signed-off-by: Chris
The function only returns -EIO if rq->errors is non-zero, which is not
very useful and lets a large number of callers ignore the return value.
Just let the callers figure out their error themselves.
Signed-off-by: Christoph Hellwig
---
block/blk-exec.c | 8 +---
block/scsi_i
Remove passing req->errors (which at that point is always 0) to
blk_mq_complete_requestq, and rely on the virtio status code for the
serial number passthrough request.
Signed-off-by: Christoph Hellwig
---
drivers/block/virtio_blk.c | 10 +++---
1 file changed, 3 insertions(+), 7 deletions(-)
Signed-off-by: Christoph Hellwig
---
drivers/block/virtio_blk.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index eaf99022bdc6..dbc4e80680b1 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk
We want our own clearly defined error field for NVMe passthrough commands,
and the request errors field is going away in its current form.
Just store the status and result field in the nvme_request field from
hardirq completion context (using a new helper) and then generate a
Linux errno for the b
This drivers was added in 2008, but as far as a I can tell we never had a
single platform that actually registered resources for the platform driver.
It's also been unmaintained for a long time and apparently has a ATA mode
that can be driven using the IDE/libata subsystem.
Signed-off-by: Christo
Currently the request structure has an errors field that is used in
various different ways. The oldest drivers use it as an error count,
blk-mq and the generic timeout code assume that it holds a Linux
errno for block completions, and various drivers use it for internal
status values, often overwr
On Thu, Apr 06, 2017 at 09:58:58AM +0200, Christoph Hellwig wrote:
> Ok, the version below simplify skip the function split entirely:
>
> ---
> From 7c9ca58f1d8cf53b42f14a51e02d0f3d0f12ab45 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig
> Date: Thu, 12 Jan 2017 11:17:29 +0100
> Subject: csios
When a command has timed out we always should be sending an
abort; with the previous code a failed abort might signal
SCSI EH to start, and all other timed out commands will
never be aborted, even though they might belong to a
different ITL nexus.
Cc: Benjamin Block
Signed-off-by: Hannes Reinecke
There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.
Signed-off-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
---
Documentation/scsi/scsi_eh.txt
Hi all,
this is a resend of a small patchset for cleaning up SCSI EH.
Primary goal is to make asynchronous aborts mandatory; there hasn't
been a single report so far where asynchronous abort won't work, so
the 'no_async_abort' flag has never been used and will be removed
with this patchset.
Additi
If a failed command is retried and fails again we need
to enter SCSI EH, otherwise we will never be able to
recover the command.
To detect this situation we must not clear scmd->eh_eflags
when EH finishes but rather make it persistent throughout
the lifetime of the command.
Signed-off-by: Hannes R
The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:
sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!
The block layer always calls the timeout function from a workqueue
context, so there is no need to have yet another workqueue for
running command aborts.
Signed-off-by: Hannes Reinecke
---
drivers/scsi/scsi.c | 2 --
drivers/scsi/scsi_error.c | 83 +++--
scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.
But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a
If sd_eh_action() decides to take the device offline there is
no point in returning FAILED, as taking the device offline
is the ultimate step in SCSI EH anyway.
So further escalation via SCSI EH is not likely to make a
difference and we can as well return SUCCESS.
Cc: Benjamin Block
Signed-off-by
From: Christoph Hellwig
We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason. Allow async aborts.
Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
---
drivers/scsi/libsas/sas_scsi_ho
Hi Martin,
I noticed that the following commits
eb94588dabec scsi: hpsa: fix volume offline state
2ef288498087 scsi: hpsa: do not timeout reset operations
87b9e6aa87d9 scsi: hpsa: limit outstanding rescans
85b29008d8af scsi: hpsa: update check for logical volume status
are included in mkp/4.11/s
On Thu, Apr 06, 2017 at 08:38:10AM +0200, Wouter Verhelst wrote:
> On Wed, Apr 05, 2017 at 01:30:31PM +0200, Michal Hocko wrote:
> > On Wed 05-04-17 09:46:59, Vlastimil Babka wrote:
> > > We now have memalloc_noreclaim_{save,restore} helpers for robust setting
> > > and
> > > clearing of PF_MEMALL
From: Colin Ian King
There are several local or function parameter pointers that are
being assigned NULL after a kfree where and these have no effect
and hence can be removed.
Fixes various cppcheck warnings:
"Assignment of function parameter has no effect outside the
function. Did you forget d
On Wed, Apr 05, 2017 at 07:18:08PM +0200, Christoph Hellwig wrote:
> ->retries is counting the number of times a command is resubmitted, and
> be cleared on the first time we see the command. We currently don't do
> that for non-PCIe command, which is easily fixed by moving the setup
> to common c
On 04/06/2017 08:27 AM, Arun Easi wrote:
> Hi Hannes,
>
> Thanks for taking a crack at the issue. My comments below..
>
> On Tue, 4 Apr 2017, 5:07am, Hannes Reinecke wrote:
>
>> Most legacy HBAs have a tagset per HBA, not per queue. To map
>> these devices onto block-mq this patch implements a n
On Thu, Apr 06, 2017 at 04:35:56PM +0800, 廖亨权 wrote:
> Hi, Guys,
> I want to ask if there is any plan to plant the NVMe driver to
> Vxworks OS?Thank you so much.---end quoted text---
The Linux NVMe team has no plans for a Vxworks NVMe driver at the moment.
On Wed, Apr 05, 2017 at 09:52:50AM -0700, Bart Van Assche wrote:
> Now that all scsi_device_get() callers check the return value of this
> function, make checking that return value mandatory.
>
> Signed-off-by: Bart Van Assche
> Cc: Hannes Reinecke
> Cc: Johannes Thumshirn
> ---
Looks good,
Re
On Thu, Apr 06, 2017 at 12:30:43AM +, Bart Van Assche wrote:
> On Thu, 2017-04-06 at 08:27 +0800, kbuild test robot wrote:
> > All warnings (new ones prefixed by >>):
> >
> >drivers//scsi/osd/osd_uld.c: In function 'osd_probe':
> > > > drivers//scsi/osd/osd_uld.c:467:2: warning: ignoring r
Some compilers don't like BLK_DEF_MAX_SECTORS being an enum (int) when
expanding min_not_zero. Cast it to sector_t so it matches the type of
the other operand, logical_to_sectors().
Signed-off-by: Fam Zheng
---
drivers/scsi/sd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a
Ok, the version below simplify skip the function split entirely:
---
>From 7c9ca58f1d8cf53b42f14a51e02d0f3d0f12ab45 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig
Date: Thu, 12 Jan 2017 11:17:29 +0100
Subject: csiostor: switch to pci_alloc_irq_vectors
And get automatic MSI-X affinity for free.
On Thu 06-04-17 09:33:44, Adrian Hunter wrote:
> On 05/04/17 14:39, Vlastimil Babka wrote:
> > On 04/05/2017 01:36 PM, Richard Weinberger wrote:
> >> Michal,
> >>
> >> Am 05.04.2017 um 13:31 schrieb Michal Hocko:
> >>> On Wed 05-04-17 09:47:00, Vlastimil Babka wrote:
> Nandsim has own function
73 matches
Mail list logo