Hi all,
If an NVMe drive is locked with ATA Security, most commands sent to the drive
will fail. This includes commands sent by the kernel upon discovery to probe
for partitions. The failing happens in such a way that trying to do anything
with the drive (e.g. sending an unlock command; unloadi
Hey Al,
What happens if we feed it a 3-element iovec array, one page in each?
AFAICS, bio_add_pc_page() is called for each of those pages, even if
the previous calls have failed - break is only out of the inner loop.
Sure, failure due to exceeded request size means that everything after
that on
On 14/09/16 07:18, Christoph Hellwig wrote:
Use the new helper to automatically select the right interrupt type, as
well as to use the automatic interupt affinity assignment.
Patch title and the change description are a little short IMO to
describe what is going on here (need the blk-mq side
Looks fine,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Acked-by: Sagi Grimberg
Hi Rkesh,
this looks reasonable, but we'll need to also adopt the non-PCI
driver to the new state machine. I can give this a spin.
At that point we probably want to move nvme_reset into common
code somehow.
Hi Guys, sorry for barging in late, I've been way too busy with
internal stuff lately.
/*
+* Avoid configuration and syncing commands if controller is already
+* being removed and queues have been killed.
+*/
+ if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
+ return;
+
Hey Rakesh, Christoph,
Given that
Allowing multiple resets can result in multiple controller removal as
well if different conditions inside nvme_reset_work fail and which
might deadlock on device_release_driver.
This patch makes sure that work queue item (reset_work) is added only
if controller state != NVME_CTRL_RESETTING and
Looks good to me,
Reviewed-by: Sagi Grimberg
Looks good,
Reviewed-by: Sagi Grimberg
On 10/4/2015 12:00 PM, Geliang Tang wrote:
Just fix a typo in the code comment.
Signed-off-by: Geliang Tang
Acked-by: Sagi Grimberg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo inf
On 10/7/2015 3:29 PM, Arnd Bergmann wrote:
The INIT_UDATA() macro requires a pointer or unsigned long argument for
both input and output buffer, and all callers had a cast from when
the code was merged until a recent restructuring, so now we get
core/uverbs_cmd.c: In function 'ib_uverbs_create_c
Looks good,
Reviewed-by: Sagi Grimberg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
On 06/25/2018 10:08 AM, Johannes Thumshirn wrote:
On Tue, Jun 19, 2018 at 05:09:27PM +0300, Sagi Grimberg wrote:
We are going to need it for traffic based keep alive to update
that we saw a completion to extend the kato.
But I suggest you simply keep a ctrl reference in struct nvme_request
Looks good Johannes,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
This looks fine to me, but we need Keith to ack on this.
Hi,
Hi Oza,
we are configuring interrupt coalesce for NVMe, but right now, it uses
module param.
so the same interrupt coalesce settings get applied for all the NVMEs
connected to different RCs.
ideally it should be with sysctl.
If at all, I would place this in nvme-cli (via ioctl) instea
Hey Keith,
Thanks for the fix. It looks like we still have a problem, though.
Commands submitted with the "shutdown_lock" held need to be able to make
forward progress without relying on a completion, but this one could
block indefinitely.
Can you explain to me why is the shutdown_lock needed
After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl state
is changed to RECONNECTING state
after some clearing and shutdown work, then some initializing procedure, no
matter reset work path or error recovery path.
The fc reset work also does the same thing.
So if we defi
see V4 discussion.. :)
I didn't see any v4
.
I think that if its not coming for 4.16, it should be easy enough to
take it in.
FWIW,
Reviewed-by: Sagi Grimberg
Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
before queue the reset work. This is not so strict. There could be
a big gap before the reset_work callback is invoked. In addition,
there is some disable work in the reset_work callback, strictly
speaking, not part of reset work,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e477c35..0530432 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1212,19 +1212,26 @@ static enum blk_eh_timer_return nvme_timeout(struct
request *req, bool reserved)
}
/*
-* Shutdown
Looks good,
Reviewed-by: Sagi Grimberg
- if (nvmeq->sq_cmds_io)
- memcpy_toio(&nvmeq->sq_cmds_io[tail], cmd, sizeof(*cmd));
- else
- memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd));
+ memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd));
Hmm, how safe is replacing memcpy_toio() with regula
On 03/08/2018 08:19 AM, Jianchao Wang wrote:
Currently, we use nvme_cancel_request to complete the request
forcedly. This has following defects:
- It is not safe to race with the normal completion path.
blk_mq_complete_request is ok to race with timeout path,
but not with itself.
-
Not sure I understand the 'blocking' problem in this case.
We can build a bvec table from this req, and send them all
in send(),
I would like to avoid growing bvec tables and keep everything
preallocated. Plus, a bvec_iter operates on a bvec which means
we'll need a table there as well... No
This does not hold at least for NVMe RDMA host driver. An example scenario
is when the RDMA connection is gone while the controller is being deleted.
In this case, the nvmf_reg_write32() for sending shutdown admin command by
the delete_work could be hung forever if the command is not completed
Could you please take a look at this bug and code review?
We are seeing more instances of this bug and found that reconnect_work
could hang as well, as can be seen from below stacktrace.
Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
Call Trace:
__schedule+0x2ab/0x880
sc
From: Sagi Grimberg
Signed-off-by: Sagi Grimberg
---
include/linux/nvme-tcp.h | 189 +++
include/linux/nvme.h | 1 +
2 files changed, 190 insertions(+)
create mode 100644 include/linux/nvme-tcp.h
diff --git a/include/linux/nvme-tcp.h b/include/linux
From: Sagi Grimberg
The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram
and we are trying to unite its logic with skb_copy_datagram_iter by passing
a callback to the copy function that we want to apply. Thus, we need
to make the checksum pointer private to the function
From: Sagi Grimberg
This will be useful to consolidate skb_copy_and_hash_datagram_iter and
skb_copy_and_csum_datagram to a single code path.
Acked-by: David S. Miller
Signed-off-by: Sagi Grimberg
---
net/core/datagram.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff
Looks good (both #1 if we want to use the two patch version or #2).
I have no idea what I did when I was trying to do when I tried it.
You want to submit or should I?
You want to submit or should I?
Whatever is easier for you
You're the owner, you should go ahead with it. Thanks!
Not related to your patch, but I did notice that the req->q->id isn't
really useful here since that's not the hardware context identifier.
That's just some ida assigned software identifier. For the admin command
completion trace, it's actually a little confusing to see the qid in the
trace.
I
.. Johannes, can you respin with this version?
Please CC james and keith on it too.
--
nvme: cache struct nvme_ctrl reference to struct nvme_request
We will need to reference the controller in the setup and
completion time for tracing and future traffic based keep alive
support.
Thinking more on this, not using the hw qid really limits the utility
out of using these trace events: We may not be able to match a completion
to the submission without it since cmdid alone isn't enough to match up
the two events.
Here's an updated proposal and actually tested. I was also abl
On 06/11/2018 09:46 PM, Johannes Thumshirn wrote:
Add disk name to tracepoints so we can better distinguish between
individual disks in the trace output.
Signed-off-by: Johannes Thumshirn
---
I'm not entirely sure if this adding the ctrl pointers to
nvme_complete_rq() and nvme_setup_cmd()
Hi Folks,
I'm sorry to chime in super late on this, but a lot has been
going on for me lately which got me off the grid.
So I'll try to provide my input hopefully without starting any more
flames..
This patch series aims to provide a more fine grained control over
nvme's native multipathing, b
Add disk name to tracepoints so we can better destinguish between
individual disks in the trace output.
Looks useful,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
@@ -246,3 +246,27 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
blk_cleanup_queue(head->disk->queue);
put_disk(head->disk);
}
+
+int nvme_mpath_change_personality(struct nvme_subsystem *subsys)
+{
+ struct nvme_ctrl *ctrl;
+ int ret = 0;
+
+restart:
+
Wouldn't expect you guys to nurture this 'mpath_personality' knob. SO
when features like "dispersed namespaces" land a negative check would
need to be added in the code to prevent switching from "native".
And once something like "dispersed namespaces" lands we'd then have to
see about a more
Moreover, I also wanted to point out that fabrics array vendors are
building products that rely on standard nvme multipathing (and probably
multipathing over dispersed namespaces as well), and keeping a knob that
will keep nvme users with dm-multipath will probably not help them
educate their c
Reviewed-by: Sagi Grimberg
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 7ae732a77fe8..febf82639b40 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -957,16 +957,17 @@ nvmf_create_ctrl(struct device *dev, const char *buf,
size_t count)
down_read(&nvmf_tra
In general it seems like fc loop needs to offload any I/O to a workqueue
just like nvme-loop does, but even then I can't see how that is going to
cause an issue in this area.
Hmm I'll be looking into it.
FWIW, I agree it should do that as well.
diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 34712def81b1..d2209c60f95f 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -509,7 +509,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
if (!rport->targetport)
Lockdep complains about inconsistent hardirq states when using
nvme-fcloop. So use the irqsave variant of spin_locks for the
nvmefc_fcp_req's reqlock.
Is this because of the nvmet bdev can trigger rx path in interrupt
context? Don't understand exactly what is violated.
Here's the lockdep r
Reviewed-by: Sagi Grimberg
I'm aware that most everything in multipath.conf is SCSI/FC specific.
That isn't the point. dm-multipath and multipathd are an existing
framework for managing multipath storage.
It could be made to work with NVMe. But yes it would not be easy.
Especially not with the native NVMe multipath cr
.
+*/
+ mb();
+
if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value))
return false;
}
Reviewed-by: Sagi Grimberg
struct nvme_sgl_desc {
__le64 addr;
- __le32 length;
+ __le64 length;
__u8rsvd[3];
__u8type;
};
in what world changing a wire protocol for this make sense?
please get rid of this hunk, NVMe will never cross the 32 bit sg element
size.
struct nvme_sgl_desc {
__le64 addr;
- __le32 length;
+ __le64 length;
__u8rsvd[3];
__u8type;
};
Isn't this a device or protocol defined datastructure? You can't just
change it like this.
You're correct, we can't...
[Replied before seeing t
nvmeq->sq_tail);
Why the newline escapes? why not escape at the 80 char border?
Other than that, looks fine,
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
I cannot reproduce the bug with the patch; in my failure scenarios, it
seems that completing the request on errors in nvme_rdma_send_done
makes __nvme_submit_sync_cmd to be unblocked. Also, I think this is
safe from the double completions.
However, it seems that nvme_rdma_timeout code is still no
Reviewed-by: Sagi Grimberg
t
yet sure how to do that, but would be nice...
Other than that,
Reviewed-by: Sagi Grimberg
I also commented over there regarding this specific problem.
I think my wording was inaccurate perhaps. It's not so much direct
recursion, but a dependency chain connecting the handler_mutex class
back to itself. This is how lockdep finds problems, it doesn't consider
actual *instances* but o
I must also say that I'm disappointed you'd try to do things this way.
I'd be (have been?) willing to actually help you understand the problem
and add the annotations, but rather than answer my question ("where do I
find the right git tree"!) you just send a revert patch.
Sorry that I had not
Regardless of the API discussion,
The nvme-tcp bits look straight-forward:
Acked-by: Sagi Grimberg
blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.
How did you get to this one? did the controller send a completion for
a comp
blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.
How did you get to this one? did the controller send a completion for
a comp
On 1/7/2015 12:44 AM, Nicholas A. Bellinger wrote:
On Wed, 2015-01-07 at 03:35 +0800, kbuild test robot wrote:
tree: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git
nvme_of
head: 40d8c11927282d59855d645b35798edd97828da5
commit: 40d8c11927282d59855d645b35798edd97828da5 [
>> On Wed, 2015-01-07 at 10:57 +0200, Sagi Grimberg wrote:
>>> On 1/7/2015 12:44 AM, Nicholas A. Bellinger wrote:
>>>> On Wed, 2015-01-07 at 03:35 +0800, kbuild test robot wrote:
>>>> tree:
>>>> git://git.kernel.org/pub/scm/linux/kernel
On 6/4/2015 10:06 AM, Nicholas A. Bellinger wrote:
On Wed, 2015-06-03 at 14:57 +0200, Christoph Hellwig wrote:
This makes lockdep very unhappy, rightly so. If you execute
one end_io function inside another you basŃ–cally nest every possible
lock taken in the I/O completion path. Also adding mor
Note that we originally allocates irqs this way, and Keith changed
it a while ago for good reasons. So I'd really like to see good
reasons for moving away from this, and some heuristics to figure
out which way to use. E.g. if the device supports more irqs than
I/O queues your scheme might alwa
On 03/01/2018 01:40 AM, Logan Gunthorpe wrote:
In order to use PCI P2P memory pci_p2pmem_[un]map_sg() functions must be
called to map the correct DMA address. To do this, we add a flags
variable and the RDMA_RW_CTX_FLAG_PCI_P2P flag. When the flag is
specified use the appropriate map function.
Hi Everyone,
Hi Logan,
Here's v2 of our series to introduce P2P based copy offload to NVMe
fabrics. This version has been rebased onto v4.16-rc3 which already
includes Christoph's devpagemap work the previous version was based
off as well as a couple of the cleanup patches that were in v1.
We create a configfs attribute in each nvme-fabrics target port to
enable p2p memory use. When enabled, the port will only then use the
p2p memory if a p2p memory device can be found which is behind the
same switch as the RDMA port and all the block devices in use. If
the user enabled it an no d
Looks fine,
Reviewed-by: Sagi Grimberg
looks fine to me,
Reviewed-by: Sagi Grimberg
Any plans adding the capability to nvme-rdma? Should be
straight-forward... In theory, the use-case would be rdma backend
fabric behind. Shouldn't be hard to test either...
Looks fine,
Reviewed-by: Sagi Grimberg
On 01/03/18 04:03 AM, Sagi Grimberg wrote:
Can you describe what would be the plan to have it when these devices
do come along? I'd say that p2p_dev needs to become a nvmet_ns reference
and not from nvmet_ctrl. Then, when cmb capable devices come along, the
ns can prefer to use its ow
Reviewed-by: Sagi Grimberg
For PCIe devices the right policy is not a round robin but to use
the pcie device closer to the node. I did a prototype for that
long ago and the concept can work. Can you look into that and
also make that policy used automatically for PCIe devices?
I think that active/active makes sense for
@@ -85,7 +121,14 @@ static blk_qc_t nvme_ns_head_make_request(struct
request_queue *q,
int srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
- ns = nvme_find_path(head);
+ switch (head->mpath_policy) {
+ case NVME_MPATH_ROUND_ROBIN:
+ ns = nvme_fin
@@ -565,24 +565,24 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp
*rsp,
{
struct rdma_cm_id *cm_id = rsp->queue->cm_id;
u64 addr = le64_to_cpu(sgl->addr);
- u32 len = get_unaligned_le24(sgl->length);
u32 key = get_unaligned_le32(sgl->key);
i
Reviewed-by: Sagi Grimberg
[so much for putting out flames... :/]
This projecting onto me that I've not been keeping the conversation
technical is in itself hostile. Sure I get frustrated and lash out (as
I'm _sure_ you'll feel in this reply)
You're right, I do feel this is lashing out. And I don't appreciate it.
Pleas
Add disk name to tracepoints so we can better destinguish between
individual disks in the trace output.
Signed-off-by: Johannes Thumshirn
Reviewed-by: Sagi Grimberg
Nit: s/destinguish/distinguish/g
Christoph, can you fix it up when applying or you want me
to do it?
We plan to implement all the fancy NVMe standards like ANA, but it
seems that there is still a requirement to let the host side choose
policies about how to use paths (round-robin vs least queue depth for
example). Even in the modern SCSI world with VPD pages and ALUA,
there are still knobs th
As different nvme controllers are connect via different fabrics, some require
different timeout settings than others. This series implements per-controller
timeouts in the nvme subsystem which can be set via sysfs.
How much of a real issue is this?
block io_timeout defaults to 30 seconds whi
Reviewed-by: Sagi Grimberg
As different nvme controllers are connect via different fabrics, some require
different timeout settings than others. This series implements per-controller
timeouts in the nvme subsystem which can be set via sysfs.
How much of a real issue is this?
block io_timeout defaults to 30 seconds whi
Frankly, we have a ton of testing related special cases in the kernel.
This one is a) simple and small, only 10 LoC, b) far away from the fast
path or any other place where it could have any impact on legitimate
users and c) it prints an informal message showing you what happened.
Sorry but t
tion.
If passed, the core will attempt to attach a CQ with a completion
vector that is directed to the cpu core as the affinity hint
provided.
Signed-off-by: Sagi Grimberg
--
That one added implicit QP create flags:
--
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bdb1279a4
blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.
How did you get to this one? did the controller send a completion for
a complet
blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.
How did you get to this one? did the controller send a completion for
a comp
Reviewed-by: Sagi Grimberg
You can't see exactly where it dies but I followed the assembly to
nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
old) which returns NULL but nvme_next_ns() is returning NULL eventually
(list_next_or_null_rcu()).
So there is other bug cause nvme_next_ns abormal.
I revie
You can't see exactly where it dies but I followed the assembly to
nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
old) which returns NULL but nvme_next_ns() is returning NULL eventually
(list_next_or_null_rcu()).
So there is other bug cause nvme_next_ns abormal.
I revie
Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: c804af2c1d3152c0cf877eeb50d60c2d49ac0cf0 ("IB/srpt: use new
shared CQ mechanism")
https://git.kernel.org/cgit/linux/kernel/git/rdma/rdma.git for-next
in testcase: blktests
with following parameters:
test: srp-
Reviewed-by: Sagi Grimberg
This patch adds a check in nvmet_execute_passthru_cmd() to prevent the
following oops :-
Why LKML and not linux-nvme?
This patch adds a check in nvmet_execute_passthru_cmd() to prevent the
following oops :-
Why LKML and not linux-nvme?
My bad (+linux-nvme).
It doesn't have the patch. can you resend?
1 - 100 of 471 matches
Mail list logo