mpletely. Given the 'namespaces_rwsem' is always held
> to retrieve ns for starting/stopping request queue, this lock can prevent
> namespaces from being freed.
This looks good to me.
Reviewed-by: Keith Busch
On Tue, Feb 05, 2019 at 04:10:47PM +0100, Hannes Reinecke wrote:
> On 2/5/19 3:52 PM, Keith Busch wrote:
> > Whichever layer dispatched the IO to a CPU specific context should
> > be the one to wait for its completion. That should be blk-mq for most
> > block drivers.
> >
On Tue, Feb 05, 2019 at 03:09:28PM +, John Garry wrote:
> On 05/02/2019 14:52, Keith Busch wrote:
> > On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
> > > On 04/02/2019 07:12, Hannes Reinecke wrote:
> > >
> > > Hi Hannes,
> > >
>
On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote:
> On 04/02/2019 07:12, Hannes Reinecke wrote:
>
> Hi Hannes,
>
> >
> > So, as the user then has to wait for the system to declars 'ready for
> > CPU remove', why can't we just disable the SQ and wait for all I/O to
> > complete?
> > We c
d->state = 0" in this
path instead of clear_bit(), but last time there were objections to
it looking inconsistent with the completion+timeout paths that use it
atomicly. Anyway,
Reviewed-by: Keith Busch
On Thu, Nov 29, 2018 at 06:11:59PM +0100, Christoph Hellwig wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index a82830f39933..d0ef540711c7 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
> >
> > int blk_mq_req
use
> it's done consistently and it doesn't clash with any other flags.
>
> Fixes: f1342709d18a ("scsi: Do not rely on blk-mq for double completions")
> Signed-off-by: Dan Carpenter
Nice catch, thanks for the fix.
Reviewed-by: Keith Busch
On Wed, Nov 28, 2018 at 03:31:46PM -0700, Keith Busch wrote:
> Waiting for a freeze isn't really the criteria we need anyway: we don't
> care if there are entered requests in REQ_MQ_IDLE. We just want to wait
> for dispatched ones to return, and we currently don't have a
On Wed, Nov 28, 2018 at 09:26:55AM -0700, Keith Busch wrote:
> ---
> diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> index 9908082b32c4..116398b240e5 100644
> --- a/drivers/nvme/target/loop.c
> +++ b/drivers/nvme/target/loop.c
> @@ -428,10 +42
On Wed, Nov 28, 2018 at 09:26:55AM -0700, Keith Busch wrote:
> ---
> diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> index 9908082b32c4..116398b240e5 100644
> --- a/drivers/nvme/target/loop.c
> +++ b/drivers/nvme/target/loop.c
> @@ -428,10 +42
On Wed, Nov 28, 2018 at 08:58:00AM -0700, Jens Axboe wrote:
> On 11/28/18 8:49 AM, Keith Busch wrote:
> > On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
> >> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
> >>>> Is this t
On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote:
> > > Is this the nvme target on top of null_blk?
> >
> > Yes.
>
> And it goes away if you revert just the last patch?
It looks like a problem existed before that last p
There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 4 +---
include/linux/blk-mq.h | 14 --
2 files changed, 1
A driver may have internal state to cleanup if we're pretending a request
didn't complete. Return 'false' if the command wasn't actually completed
due to the timeout error injection, and true otherwise.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 5 +++--
i
fake
timeout injection.
Keith Busch (3):
blk-mq: Return true if request was completed
scsi: Do not rely on blk-mq for double completions
blk-mq: Simplify request completion state
block/blk-mq.c| 9 -
drivers/scsi/scsi_error.c | 22 +++---
drivers/scsi/sc
n use blk-mq's.
Signed-off-by: Keith Busch
---
drivers/scsi/scsi_error.c | 22 +++---
drivers/scsi/scsi_lib.c | 13 -
include/scsi/scsi_cmnd.h | 4
3 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/sc
On Mon, Nov 26, 2018 at 08:32:45AM -0700, Jens Axboe wrote:
> On 11/21/18 6:12 AM, Christoph Hellwig wrote:
> > On Mon, Nov 19, 2018 at 08:19:00AM -0700, Keith Busch wrote:
> >> On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote:
> >>>> index
On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote:
> > index 5d83a162d03b..c1d5e4e36125 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request
> > *req)
> >
> > static void scsi_mq_
A driver may have internal state to cleanup if we're pretending a request
didn't complete. Return 'false' if the command wasn't actually completed
due to the timeout error injection, and true otherwise.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 5 +++--
i
n use blk-mq's.
Signed-off-by: Keith Busch
---
drivers/scsi/scsi_error.c | 22 +++---
drivers/scsi/scsi_lib.c | 6 +-
include/scsi/scsi_cmnd.h | 5 -
3 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_err
since v2:
This one really plugs any gaps with fake timeout injection and
additional coode comments included.
Keith Busch (3):
blk-mq: Return true if request was completed
scsi: Do not rely on blk-mq for double completions
blk-mq: Simplify request completion state
block/blk
There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 4 +---
include/linux/blk-mq.h | 14 --
2 files changed, 1
Sorry everyone, this was the previous verision. Please ignore, I'm
sending out the updated one now.
x27;s.
Signed-off-by: Keith Busch
---
drivers/scsi/scsi_error.c | 17 +++--
drivers/scsi/scsi_lib.c | 6 +-
include/scsi/scsi_cmnd.h | 5 -
3 files changed, 12 insertions(+), 16 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c736d61
A driver may have internal state to cleanup if we're pretending a request
timeout occured. Return 'false' if the command wasn't actually completed
due to the error injection, and true otherwise.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 5 +++--
include/linu
There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 4 +---
include/linux/blk-mq.h | 14 --
2 files changed, 1
On Wed, Nov 14, 2018 at 11:00:18AM -0700, Keith Busch wrote:
> On Wed, Nov 14, 2018 at 09:51:36AM -0800, Bart Van Assche wrote:
> > Regarding this patch: I think this patch introduces a subtle but severe bug
> > in the SCSI core, namely that if an abort is processed concurrently w
On Wed, Nov 14, 2018 at 09:51:36AM -0800, Bart Van Assche wrote:
> Regarding this patch: I think this patch introduces a subtle but severe bug
> in the SCSI core, namely that if an abort is processed concurrently with
> request completion with "fake timeout" enabled that the abort is ignored.
That
There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 4 +---
include/linux/blk-mq.h | 14 --
2 files changed, 1
x27;s.
Signed-off-by: Keith Busch
---
drivers/scsi/scsi_error.c | 17 +++--
drivers/scsi/scsi_lib.c | 6 +-
include/scsi/scsi_cmnd.h | 5 -
3 files changed, 12 insertions(+), 16 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c736d61
A driver may have internal state to cleanup if we're pretending a request
timeout occured. Return 'false' if the command wasn't actually completed
due to the error injection, and true otherwise.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 5 +++--
include/linu
On Tue, Nov 13, 2018 at 12:20:46PM -0700, Jens Axboe wrote:
> On 11/13/18 11:57 AM, Keith Busch wrote:
> > static void scsi_mq_done(struct scsi_cmnd *cmd)
> > {
> > + if (test_and_set_bit(__SCMD_COMPLETE, &cmd->flags))
> > + return;
> &
x27;s.
Signed-off-by: Keith Busch
---
drivers/scsi/scsi_error.c | 17 +++--
drivers/scsi/scsi_lib.c | 11 +++
include/scsi/scsi_cmnd.h | 5 -
3 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index dd338a8
There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.
Signed-off-by: Keith Busch
---
block/blk-mq.c | 4 +---
include/linux/blk-mq.h | 14 --
2 files changed, 1
On Tue, Oct 30, 2018 at 11:33:51AM -0600, Jens Axboe wrote:
> On 10/30/18 11:22 AM, Keith Busch wrote:
> > On Tue, Oct 30, 2018 at 11:09:04AM -0600, Jens Axboe wrote:
> >> Pretty trivial, below. This also keeps the queue mapping calculations
> >> more clean, as we don
On Tue, Oct 30, 2018 at 11:09:04AM -0600, Jens Axboe wrote:
> Pretty trivial, below. This also keeps the queue mapping calculations
> more clean, as we don't have to do one after we're done allocating
> IRQs.
Yep, this addresses my concern. It less efficient than PCI since PCI
can usually jump str
On Tue, Oct 30, 2018 at 09:18:05AM -0600, Jens Axboe wrote:
> On 10/30/18 9:08 AM, Keith Busch wrote:
> > On Tue, Oct 30, 2018 at 08:53:37AM -0600, Jens Axboe wrote:
> >> The sum of the set can't exceed the nvecs passed in, the nvecs passed in
> >> should be
On Tue, Oct 30, 2018 at 08:53:37AM -0600, Jens Axboe wrote:
> The sum of the set can't exceed the nvecs passed in, the nvecs passed in
> should be the less than or equal to nvecs. Granted this isn't enforced,
> and perhaps that should be the case.
That should at least initially be true for a prope
On Tue, Oct 30, 2018 at 08:36:35AM -0600, Jens Axboe wrote:
> On 10/30/18 8:26 AM, Keith Busch wrote:
> > On Mon, Oct 29, 2018 at 10:37:35AM -0600, Jens Axboe wrote:
> >> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> >> index f4f29b9d90ee..2046a0f0f0f1 1
On Mon, Oct 29, 2018 at 10:37:35AM -0600, Jens Axboe wrote:
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index f4f29b9d90ee..2046a0f0f0f1 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct
> i
On Wed, Sep 05, 2018 at 09:38:16AM +0200, Lukas Wunner wrote:
> On Wed, Sep 05, 2018 at 11:45:45AM +0530, Sreekanth Reddy wrote:
> > On Tue, Sep 4, 2018 at 3:12 PM, Lukas Wunner wrote:
> > > Many scsi drivers call pci_channel_offline() to detect inaccessibility
> > > of the device due to a PCI err
On Wed, Jul 25, 2018 at 03:52:17PM +, Bart Van Assche wrote:
> On Mon, 2018-07-23 at 08:37 -0600, Keith Busch wrote:
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index 8932ae81a15a..2715cdaa669c 100644
> > --- a/drivers/scsi/scsi_error.c
&
On Wed, May 23, 2018 at 02:19:40PM +0200, Christoph Hellwig wrote:
> -static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx)
> - __releases(hctx->srcu)
> -{
> - if (!(hctx->flags & BLK_MQ_F_BLOCKING))
> - rcu_read_unlock();
> - else
> - srcu_read_unloc
:
Signed-off-by: Keith Busch
---
v1 -> v2:
Update blk-mq API directly instead of chaining a default parameter to
a new API, and update all drivers accordingly.
block/blk-mq-pci.c| 6 --
drivers/nvme/host/pci.c | 2 +-
drivers/scsi/qla2xxx/qla_o
For the storage track, I would like to propose a topic for differentiated
blk-mq hardware contexts. Today, blk-mq considers all hardware contexts
equal, and are selected based on the software's CPU context. There are
use cases that benefit from having hardware context selection criteria
beyond whic
On Wed, Jan 10, 2018 at 03:14:40PM -0700, Sathya Prakash Veerichetty wrote:
> In the case of RAID controllers, all of those drives and RAID volumes
> are exposed to the OS as generic SCSI devices
So even when used as a RAID member, there will be a device handle at
/dev/sdX for each NVMe device the
On Tue, Jan 09, 2018 at 03:50:44PM -0500, Douglas Gilbert wrote:
> Have you tried to do any serious work with and
> say compared it with FreeBSD and Microsoft's approach? No prize for
> guessing which one is worst (and least extensible). Looks like the
> Linux pass-through was at the end of a ToDo
On Tue, Aug 08, 2017 at 12:33:40PM +0530, Sreekanth Reddy wrote:
> On Tue, Aug 8, 2017 at 9:34 AM, Keith Busch wrote:
> >
> > It looks like they can make existing nvme tooling work with little
> > effort if they have the driver implement NVME_IOCTL_ADMIN_COMMAND, and
On Mon, Aug 07, 2017 at 08:45:25AM -0700, James Bottomley wrote:
> On Mon, 2017-08-07 at 20:01 +0530, Kashyap Desai wrote:
> >
> > We have to attempt this use case and see how it behaves. I have not
> > tried this, so not sure if things are really bad or just some tuning
> > may be helpful. I will
On Tue, Jul 11, 2017 at 01:55:02AM -0700, Suganath Prabu S wrote:
> +/**
> + * _base_check_pcie_native_sgl - This function is called for PCIe end
> devices to
> + * determine if the driver needs to build a native SGL. If so, that native
> + * SGL is built in the special contiguous buffers allocat
On Tue, Jul 11, 2017 at 01:55:02AM -0700, Suganath Prabu S wrote:
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h
> b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index 60fa7b6..cebdd8e 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -54,6 +54,7 @@
>
On Wed, Apr 05, 2017 at 04:18:55PM +0200, Christoph Hellwig wrote:
> The way NVMe uses this field is entirely different from the older
> SCSI/BLOCK_PC usage, so move it into struct nvme_request.
>
> Also reduce the size of the file to a unsigned char so that we leave space
> for additional smaller
On Thu, Feb 16, 2017 at 01:21:29PM -0500, Mike Snitzer wrote:
> Then undeprecate them. Decisions like marking a path checker deprecated
> were _not_ made with NVMe in mind. They must predate NVMe.
>
> multipath-tools has tables that specify all the defaults for a given
> target backend. NVMe wi
On Thu, Feb 16, 2017 at 05:37:41PM +, Bart Van Assche wrote:
> On Thu, 2017-02-16 at 12:38 -0500, Keith Busch wrote:
> > Maybe I'm not seeing the bigger picture. Is there some part to multipath
> > that the kernel is not in a better position to handle?
>
> Does
On Thu, Feb 16, 2017 at 10:13:37AM -0500, Mike Snitzer wrote:
> On Thu, Feb 16 2017 at 9:26am -0500,
> Christoph Hellwig wrote:
>
> > just a little new code in the block layer, and a move of the path
> > selectors from dm to the block layer. I would not call this
> > fragmentation.
>
> I'm fine
On Thu, Feb 16, 2017 at 12:05:19PM -0500, Keith Busch wrote:
> On Thu, Feb 16, 2017 at 04:12:23PM +0100, Hannes Reinecke wrote:
> > The device handler needs to check if a given queue belongs to
> > a scsi device; only then does it make sense to attach a device
> > handler.
&
On Thu, Feb 16, 2017 at 04:12:23PM +0100, Hannes Reinecke wrote:
> The device handler needs to check if a given queue belongs to
> a scsi device; only then does it make sense to attach a device
> handler.
>
> Signed-off-by: Hannes Reinecke
The thing I don't like is that this still has dm-mpath d
e values of BLK_MQ_RQ_QUEUE_[ERROR|BUSY]
not being zero without this. Looks good.
Reviewed-by: Keith Busch
> ---
> drivers/nvme/host/core.c | 4 ++--
> drivers/nvme/host/pci.c| 8
> drivers/nvme/host/rdma.c | 2 +-
> drivers/nvme/target/loop.c | 6 +++---
> 4
On Fri, Oct 28, 2016 at 11:51:35AM -0700, Bart Van Assche wrote:
> I think it is wrong that kicking the requeue list starts stopped queues
> because this makes it impossible to stop request processing without setting
> an additional flag next to BLK_MQ_S_STOPPED. Can you have a look at the
> attach
On Wed, Oct 26, 2016 at 03:56:04PM -0700, Bart Van Assche wrote:
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 7bb73ba..b662416 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -205,7 +205,7 @@ void nvme_requeue_req(struct request *req)
>
On Wed, Oct 19, 2016 at 04:51:18PM -0700, Bart Van Assche wrote:
>
> I assume that line 498 in blk-mq.c corresponds to BUG_ON(blk_queued_rq(rq))?
> Anyway, it seems to me like this is a bug in the NVMe code and also that
> this bug is completely unrelated to my patch series. In nvme_complete_rq()
Hi Bart,
I'm running linux 4.9-rc1 + linux-block/for-linus, and alternating tests
with and without this series.
Without this, I'm not seeing any problems in a link-down test while
running fio after ~30 runs.
With this series, I only see the test pass infrequently. Most of the
time I observe one
On Tue, Aug 23, 2016 at 03:14:23PM -0600, Jens Axboe wrote:
> On 08/23/2016 03:11 PM, Jens Axboe wrote:
> >My workload looks similar to yours, in that it's high depth and with a
> >lot of jobs to keep most CPUs loaded. My bash script is different than
> >yours, I'll try that and see if it helps her
On Fri, Apr 08, 2016 at 01:40:06PM -0400, Matthew Wilcox wrote:
> - Inability to use all queues supported by a device. Intel's P3700
>supports 31 queues, but block-mq insists on assigning an even multiple
>of CPUs to each queue. So if you have 48 CPUs, it will use 24 queues.
>If you
On Mon, Feb 08, 2016 at 04:19:13PM +0100, Hannes Reinecke wrote:
> Ok, so what about having a 'wwid' attribute which provides combined
> information (like scsi has)?
That looks like the sensible thing to do. Thanks for pointer.
Going forward, I will solicite more feedback from scsi developers
so
On Mon, Feb 08, 2016 at 11:13:50AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 08, 2016 at 12:01:16PM +0200, Sagi Grimberg wrote:
> >
> >> Do we have defined sysfs attributes for NVMe devices nowadays?
> >
> > /sys/block/nvme0n1/uuid
>
> That's only supported for NVMe 1.1 and higher devices, and
On Fri, 2 Oct 2015, Johannes Thumshirn wrote:
Lee Duncan writes:
Simplify ida index allocation and removal by
using the ida_simple_* helper functions.
Looks good to me. Just one comment:
static void nvme_release_instance(struct nvme_dev *dev)
{
spin_lock(&dev_list_lock);
- i
On Tue, 11 Aug 2015, Christoph Hellwig wrote:
This series adds support for a simplified Persistent Reservation API
to the block layer. The intent is that both in-kernel and userspace
consumers can use the API instead of having to hand craft SCSI or NVMe
command through the various pass through i
On Tue, 4 Aug 2015, Christoph Hellwig wrote:
NVMe support currently isn't included as I don't have a multihost
NVMe setup to test on, but if I can find a volunteer to test it I'm
happy to write the code for it.
Looks pretty good so far. I'd be happy to give try it out with NVMe
subsystems.
--
T
On Thu, 10 Jul 2014, Bjorn Helgaas wrote:
[+cc LKML, Greg KH for driver core async shutdown question]
On Tue, Jun 24, 2014 at 10:48:57AM -0600, Keith Busch wrote:
To provide context why I want to do this asynchronously, NVM-Express has
one PCI device per controller, of which there could be
On Tue, 24 Jun 2014, Elliott, Robert (Server Storage) wrote:
1. That will cover the .shutdown function used by mptfc.c, mptspi.c,
and mptscsih.c, but mptsas.c uses mptsas_shutdown rather than
mptscsih_shutdown. It doesn't call pci_disable_msi either.
Missed that; thanks.
2. mptscsih_suspend
, mptfusion was compile tested only; I didn't observe any
adverse affects from running the pci portion.
Signed-off-by: Keith Busch
Cc: Nagalakshmi Nandigama
Cc: Sreekanth Reddy
Cc: Bjorn Helgaas
---
drivers/message/fusion/mptscsih.c |3 +++
drivers/pci/pci-driver.c |2 --
On Fri, 20 Sep 2013, Martin K. Petersen wrote:
"Keith" == Keith Busch writes:
Keith> The ref tag should be the device's physical LBA rather than the
Keith> 512 byte bio sector.
The bip sector is just a seed value set by the application. It is not
correct to scale it ba
The ref tag should be the device's physical LBA rather than the 512 byte
bio sector.
Signed-off-by: Keith Busch
Cc: Martin K. Petersen
Cc: James E.J. Bottomley
Cc: Ric Wheeler
---
I CC'ed James and Ric as I think you guys expressed some interest in
seeing if this was a legit concer
74 matches
Mail list logo