On Tue, Oct 25, 2022 at 11:56:17AM +0100, Jonathan Cameron wrote:
> On Mon, 24 Oct 2022 13:42:54 -0400
> Gregory Price <gregory.pr...@memverge.com> wrote:
> 
> > On Fri, Oct 14, 2022 at 04:10:44PM +0100, Jonathan Cameron wrote:
> > > From: Huai-Cheng Kuo <hch...@avery-design.com.tw>
> > > 
> > > The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
> > > in "-device cxl-type3"'s command option. The file is required to provide
> > > the whole CDAT table in binary mode. The other is to use the default
> > > that provides some 'reasonable' numbers based on type of memory and
> > > size.
> > > 
> > > The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
> > > capability offset 0x190. The config read/write to this capability range
> > > can be generated in the OS to request the CDAT data.
> > > 
> > > Signed-off-by: Huai-Cheng Kuo <hch...@avery-design.com.tw>
> > > Signed-off-by: Chris Browy <cbr...@avery-design.com>
> > > Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com>
> > >   
> > 
> > In reviewing this patch under a debug kernel, I'm discovering some
> > warnings / tracebacks that I want to get a sanity check on.  They appear
> > to be warnings, but i'm not 100% sure what to make of them.
> > 
> > I get the following stack traces during boot (on the x86 machine).
> > 
> > Removing the type-3 device from the configuration prevents the traceback
> > from occurring, suggesting it's something to do with that code in
> > particular (which tracks with the patch here)
> 
> Thanks Gregory,
> 
> We have an INIT_WORK() in pci_doe_submit_task()
> that in the pci_doe_discovery() call is based a work structure that is
> on the stack.  As such should be INIT_WORK_ONSTACK()
> 
> This is a little tricky becaues there is no particular reason to assume
> all struct pci_doe_task instances will be on the stack though they are
> today.  We don't really want to break the layering as would be necessary
> to have this init at the locations where we otherwise initialize the
> containing structure.
> 
> My first thought is add an 'onstack' bool to either the pci_doe_submit_task()
> or perhaps better would be to add it to the pci_doe_task() and have
> macros like DECLARE_CDAT_DOE_TASK_ONSTACK() set it appropriately so we
> can call the right INIT_WORK_*() variant later.
> 
> Ira / others, which way to go to fix this?

Yes Jonathan is on the right track here.  Though I was confused why no one had
ever seen this till now.  I see it was because I never ran with the
CONFIG_DEBUG_OBJECTS_WORK option.

While we could have a declaration macro.  I think the best solution is to
separate the task from the internal implementation; see patch below.  I was
never fully happy with the idea of having the work struct in the user visible
task object anyway.

> 
> I'll also reply to the last version of that series to make sure people see
> this discussion.

Thanks I've been looking for time to look at this series and have missed it.
So the ping over there helped!  :-D

Ira

>From 9e16d5e399723412acbaec1bb9be807d5e5bf7fc Mon Sep 17 00:00:00 2001
From: Ira Weiny <ira.we...@intel.com>
Date: Mon, 31 Oct 2022 11:31:30 -0700
Subject: [RFC PATCH] PCI/doe: Fix work struct declaration

The callers of pci_doe_submit_task() allocate the pci_doe_task on the
stack.  This causes the work structure to be allocated on the stack
without pci_doe_submit_task() knowing.  Work item initialization needs
to be done with either INIT_WORK_ONSTACK() or INIT_WORK() depending on
how the work item is allocated.

Jonathan suggested creating doe task allocation macros such as
DECLARE_CDAT_DOE_TASK_ONSTACK().  This would work, however, having the
work structure be part of pci_doe_task seems like a layering violation.

Create an internal struct pci_doe_work which represents the work and use
this for the work queue.

RFC because I'm wondering if a gfp_t flags parameter should be added to
pci_doe_submit_task() or not.

[1] 
https://lore.kernel.org/linux-cxl/20221014151045.24781-1-jonathan.came...@huawei.com/T/#m88a7f50dcce52f30c8bf5c3dcc06fa9843b54a2d

Reported-by: Gregory Price <gregory.pr...@memverge.com>
Reported-by: Jonathan Cameron <jonathan.came...@huawei.com>
Signed-off-by: Ira Weiny <ira.we...@intel.com>
---
 drivers/pci/doe.c       | 56 +++++++++++++++++++++++++++--------------
 include/linux/pci-doe.h |  4 ---
 2 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index e402f05068a5..6a9fa57e2cac 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -29,6 +29,12 @@
 #define PCI_DOE_FLAG_CANCEL    0
 #define PCI_DOE_FLAG_DEAD      1
 
+struct pci_doe_work {
+       struct pci_doe_task *task;
+       struct work_struct work;
+       struct pci_doe_mb *doe_mb;
+};
+
 /**
  * struct pci_doe_mb - State for a single DOE mailbox
  *
@@ -214,9 +220,9 @@ static void signal_task_complete(struct pci_doe_task *task, 
int rv)
        task->complete(task);
 }
 
-static void signal_task_abort(struct pci_doe_task *task, int rv)
+static void signal_task_abort(struct pci_doe_mb *doe_mb, struct pci_doe_task 
*task,
+                             int rv)
 {
-       struct pci_doe_mb *doe_mb = task->doe_mb;
        struct pci_dev *pdev = doe_mb->pdev;
 
        if (pci_doe_abort(doe_mb)) {
@@ -233,9 +239,10 @@ static void signal_task_abort(struct pci_doe_task *task, 
int rv)
 
 static void doe_statemachine_work(struct work_struct *work)
 {
-       struct pci_doe_task *task = container_of(work, struct pci_doe_task,
-                                                work);
-       struct pci_doe_mb *doe_mb = task->doe_mb;
+       struct pci_doe_work *doe_work = container_of(work, struct pci_doe_work,
+                                                    work);
+       struct pci_doe_task *task = doe_work->task;
+       struct pci_doe_mb *doe_mb = doe_work->doe_mb;
        struct pci_dev *pdev = doe_mb->pdev;
        int offset = doe_mb->cap_offset;
        unsigned long timeout_jiffies;
@@ -244,7 +251,7 @@ static void doe_statemachine_work(struct work_struct *work)
 
        if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags)) {
                signal_task_complete(task, -EIO);
-               return;
+               goto free_work;
        }
 
        /* Send request */
@@ -260,8 +267,8 @@ static void doe_statemachine_work(struct work_struct *work)
                if (rc == -EBUSY)
                        dev_err_ratelimited(&pdev->dev, "[%x] busy detected; 
another entity is sending conflicting requests\n",
                                            offset);
-               signal_task_abort(task, rc);
-               return;
+               signal_task_abort(doe_mb, task, rc);
+               goto free_work;
        }
 
        timeout_jiffies = jiffies + PCI_DOE_TIMEOUT;
@@ -269,30 +276,32 @@ static void doe_statemachine_work(struct work_struct 
*work)
 retry_resp:
        pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
        if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
-               signal_task_abort(task, -EIO);
-               return;
+               signal_task_abort(doe_mb, task, -EIO);
+               goto free_work;
        }
 
        if (!FIELD_GET(PCI_DOE_STATUS_DATA_OBJECT_READY, val)) {
                if (time_after(jiffies, timeout_jiffies)) {
-                       signal_task_abort(task, -EIO);
-                       return;
+                       signal_task_abort(doe_mb, task, -EIO);
+                       goto free_work;
                }
                rc = pci_doe_wait(doe_mb, PCI_DOE_POLL_INTERVAL);
                if (rc) {
-                       signal_task_abort(task, rc);
-                       return;
+                       signal_task_abort(doe_mb, task, rc);
+                       goto free_work;
                }
                goto retry_resp;
        }
 
        rc  = pci_doe_recv_resp(doe_mb, task);
        if (rc < 0) {
-               signal_task_abort(task, rc);
-               return;
+               signal_task_abort(doe_mb, task, rc);
+               goto free_work;
        }
 
        signal_task_complete(task, rc);
+free_work:
+       kfree(doe_work);
 }
 
 static void pci_doe_task_complete(struct pci_doe_task *task)
@@ -510,10 +519,14 @@ EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
  *
  * Excess data will be discarded.
  *
+ * Context: non-atomic; allocates with GFP_KERNEL
+ *
  * RETURNS: 0 when task has been successfully queued, -ERRNO on error
  */
 int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 {
+       struct pci_doe_work *work;
+
        if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type))
                return -EINVAL;
 
@@ -528,9 +541,14 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct 
pci_doe_task *task)
        if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
                return -EIO;
 
-       task->doe_mb = doe_mb;
-       INIT_WORK(&task->work, doe_statemachine_work);
-       queue_work(doe_mb->work_queue, &task->work);
+       work = kzalloc(sizeof(*work), GFP_KERNEL);
+       if (!work)
+               return -ENOMEM;
+
+       work->task = task;
+       work->doe_mb = doe_mb;
+       INIT_WORK(&work->work, doe_statemachine_work);
+       queue_work(doe_mb->work_queue, &work->work);
        return 0;
 }
 EXPORT_SYMBOL_GPL(pci_doe_submit_task);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index ed9b4df792b8..7c573cdbf17b 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -52,10 +52,6 @@ struct pci_doe_task {
        int rv;
        void (*complete)(struct pci_doe_task *task);
        void *private;
-
-       /* No need for the user to initialize these fields */
-       struct work_struct work;
-       struct pci_doe_mb *doe_mb;
 };
 
 /**
-- 
2.37.2

Reply via email to