On Tue, Jun 11, 2019 at 5:09 AM Robin Gong <[email protected]> wrote:
>
> Sven, no any dependency from sdma driver view. The only difference between
> directly loading firmware
> from kernel and rootfs is the former spend more time during kernel boot and
> such timing may cause
> the crash. The issue is not 100% in my side, about 20% possibility, which
> looks like 'timing issue' . Another
> interesting thing is that every time the crash stop at somewhere drm, and
> After I disable ipu and display
> which use drm in i.mx6q.dtsi, the issue is gone on my i.mx6q-sabreauto board.
> Could you have a try with below patch as mine? If the issue is gone on your
> side, we could involve drm guys to
> look into it.
When I apply your patch to ipu and display, the crash still happens on
my device.
But when I disable NFSv4 network filesystem in defconfig, the crash disappears.
Yet on linux-next, the crash is there again, even if I disable the IPU or NFSv4.
My guess: we are chasing ghosts, the crashes are purely timing related. Things
like disabling the IPU or NFSv4 change boot timing, and this changes the crash.
Experiment: If I put msleep(1000) right before the sdma_load_script() call, then
the crash never happens. And if I comment out the call to sdma_run_channel0()
in sdma_load_script(), then the crash also does not happen.
This suggests that the crash is related to the exact timing when
sdma_run_channel0() is called. If it is called too early, this results
in an 'interrupt storm' on the sdma interrupt handler: it gets called
millions of times in a very short amount of time.
By adding debug prints, I noticed that the sdma core calls back
sdma_alloc_chan_resources(), later during the boot, when a spi
bus is created.
Experiment: I paused firmware upload until the first time
sdma_alloc_chan_resources() is called by the core.
I used a struct completion to accomplish this.
Result: the crash never happens again.
All this suggests very strongly that sdma_run_channel0() is called
"too early" by the driver. I don't known enough of imx-sdma to
know what is missing during the early call.
Here is the patch to delay firmware load until the first
sdma_alloc_chan_resources() has completed:
diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 99d9f431ae2c..ddeded5c3337 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -33,6 +33,7 @@
#include <linux/of_device.h>
#include <linux/of_dma.h>
#include <linux/workqueue.h>
+#include <linux/completion.h>
#include <asm/irq.h>
#include <linux/platform_data/dma-imx-sdma.h>
@@ -444,6 +445,7 @@ struct sdma_engine {
struct sdma_buffer_descriptor *bd0;
/* clock ratio for AHB:SDMA core. 1:1 is 1, 2:1 is 0*/
bool clk_ratio;
+ struct completion chan_resources_alloced;
};
static int sdma_config_write(struct dma_chan *chan,
@@ -1258,6 +1260,7 @@ static void sdma_desc_free(struct virt_dma_desc *vd)
static int sdma_alloc_chan_resources(struct dma_chan *chan)
{
struct sdma_channel *sdmac = to_sdma_chan(chan);
+ struct sdma_engine *sdma = sdmac->sdma;
struct imx_dma_data *data = chan->private;
struct imx_dma_data mem_data;
int prio, ret;
@@ -1310,6 +1313,7 @@ static int sdma_alloc_chan_resources(struct
dma_chan *chan)
if (ret)
goto disable_clk_ahb;
+ complete(&sdma->chan_resources_alloced);
return 0;
disable_clk_ahb:
@@ -1724,6 +1728,7 @@ static void sdma_load_firmware(const struct
firmware *fw, void *context)
/* In this case we just use the ROM firmware. */
return;
}
+ wait_for_completion(&sdma->chan_resources_alloced);
if (fw->size < sizeof(*header))
goto err_firmware;
@@ -2012,6 +2017,7 @@ static int sdma_probe(struct platform_device *pdev)
return -ENOMEM;
spin_lock_init(&sdma->channel_0_lock);
+ init_completion(&sdma->chan_resources_alloced);
sdma->dev = &pdev->dev;
sdma->drvdata = drvdata;