On 6/14/19 3:10 PM, Himanshu Madhani wrote:
From: Quinn Tran <qut...@marvell.com>
This patch uses kref to protect access between fcp_abort
path and nvme command and LS command completion path.
Stack trace below shows the abort path is accessing stale
memory (nvme_private->sp).
When command kref reaches 0, nvme_private & srb resource will
be disconnected from each other. Any subsequence nvme abort
request will not be able to reference the original srb.
[ 5631.003998] BUG: unable to handle kernel paging request at 00000010000005d8
[ 5631.004016] IP: [<ffffffffc087df92>] qla_nvme_abort_work+0x22/0x100 [qla2xxx]
[ 5631.004086] Workqueue: events qla_nvme_abort_work [qla2xxx]
[ 5631.004097] RIP: 0010:[<ffffffffc087df92>] [<ffffffffc087df92>]
qla_nvme_abort_work+0x22/0x100 [qla2xxx]
[ 5631.004109] Call Trace:
[ 5631.004115] [<ffffffffaa4b8174>] ? pwq_dec_nr_in_flight+0x64/0xb0
[ 5631.004117] [<ffffffffaa4b9d4f>] process_one_work+0x17f/0x440
[ 5631.004120] [<ffffffffaa4bade6>] worker_thread+0x126/0x3c0
Signed-off-by: Quinn Tran <qut...@marvell.com>
Signed-off-by: Himanshu Madhani <hmadh...@marvell.com>
---
drivers/scsi/qla2xxx/qla_def.h | 2 +
drivers/scsi/qla2xxx/qla_nvme.c | 164 ++++++++++++++++++++++++++++------------
drivers/scsi/qla2xxx/qla_nvme.h | 1 +
3 files changed, 117 insertions(+), 50 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 602ed24bb806..85a27ee5d647 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -532,6 +532,8 @@ typedef struct srb {
uint8_t cmd_type;
uint8_t pad[3];
atomic_t ref_count;
+ struct kref cmd_kref; /* need to migrate ref_count over to this */
+ void *priv;
wait_queue_head_t nvme_ls_waitq;
struct fc_port *fcport;
struct scsi_qla_host *vha;
My feedback about this patch is as follows:
- I think that having two atomic counters in struct srb with a very
similar purpose is overkill and also that a single atomic counter
should be sufficient in this structure.
- The problem fixed by this patch not only can happen with NVMe-FC but
also with FC. I would prefer to see all code paths fixed for which a
race between completion and abort can occur.
Thanks,
Bart.