adreno: De-spaghettify the use of memory barriers

Andrew Halaney Thu, 16 May 2024 06:15:49 -0700

On Wed, May 15, 2024 at 12:08:49AM GMT, Akhil P Oommen wrote:
> On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
> > Memory barriers help ensure instruction ordering, NOT time and order
> > of actual write arrival at other observers (e.g. memory-mapped IP).
> > On architectures employing weak memory ordering, the latter can be a
> > giant pain point, and it has been as part of this driver.
> > 
> > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > readl/writel, which include r/w (respectively) barriers.
> > 
> > Replace the barriers with a readback that ensures the previous writes
> > have exited the write buffer (as the CPU must flush the write to the
> > register it's trying to read back) and subsequently remove the hack
> > introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
> > status in hw_init").


For what its worth, I've been eyeing (but haven't tested) sending some
patches to clean up dsi_phy_write_udelay/ndelay(). There's no ordering
guarantee between a writel() and a delay(), so the expected "write then
delay" sequence might not be happening.. you need to write, read, delay.

memory-barriers.txt:

        5. A readX() by a CPU thread from the peripheral will complete before
           any subsequent delay() loop can begin execution on the same thread.
           This ensures that two MMIO register writes by the CPU to a peripheral
           will arrive at least 1us apart if the first write is immediately read
           back with readX() and udelay(1) is called prior to the second
           writeX():

                writel(42, DEVICE_REGISTER_0); // Arrives at the device...
                readl(DEVICE_REGISTER_0);
                udelay(1);
                writel(42, DEVICE_REGISTER_1); // ...at least 1us before this.

> > 
> > Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in hw_init")
> > Signed-off-by: Konrad Dybcio <konrad.dyb...@linaro.org>
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  5 ++---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 ++++----------
> >  2 files changed, 6 insertions(+), 13 deletions(-)
> 
> I prefer this version compared to the v2. A helper routine is
> unnecessary here because:
> 1. there are very few scenarios where we have to read back the same
> register.
> 2. we may accidently readback a write only register.
> 
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 0e3dfd4c2bc8..4135a53b55a7 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
> >     int ret;
> >     u32 val;
> >  
> > -   gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> > -   /* Wait for the register to finish posting */
> > -   wmb();
> > +   gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> > +   gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);
> 
> This is unnecessary because we are polling on a register on the same port 
> below. But I think we
> can replace "wmb()" above with "mb()" to avoid reordering between read
> and write IO instructions.

If I understand correctly, you don't need any memory barrier.
writel()/readl()'s are ordered to the same endpoint. That goes for all
the reordering/barrier comments mentioned below too.

device-io.rst:

    The read and write functions are defined to be ordered. That is the
    compiler is not permitted to reorder the I/O sequence. When the ordering
    can be compiler optimised, you can use __readb() and friends to
    indicate the relaxed ordering. Use this with care.

memory-barriers.txt:

     (*) readX(), writeX():

            The readX() and writeX() MMIO accessors take a pointer to the
            peripheral being accessed as an __iomem * parameter. For pointers
            mapped with the default I/O attributes (e.g. those returned by
            ioremap()), the ordering guarantees are as follows:

            1. All readX() and writeX() accesses to the same peripheral are 
ordered
               with respect to each other. This ensures that MMIO register 
accesses
               by the same CPU thread to a particular device will arrive in 
program
               order.


> 
> >  
> >     ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
> >             val & (1 << 1), 100, 10000);
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 973872ad0474..0acbc38b8e70 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu)
> >     }
> >  
> >     /* Clear GBIF halt in case GX domain was not collapsed */
> > +   gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> 
> We need a full barrier here to avoid reordering. Also, lets add a
> comment about why we are doing this odd looking sequence.
> 
> > +   gpu_read(gpu, REG_A6XX_GBIF_HALT);
> >     if (adreno_is_a619_holi(adreno_gpu)) {
> > -           gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> >             gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> > -           /* Let's make extra sure that the GPU can access the memory.. */
> > -           mb();
> 
> We need a full barrier here.
> 
> > +           gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
> >     } else if (a6xx_has_gbif(adreno_gpu)) {
> > -           gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> >             gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> > -           /* Let's make extra sure that the GPU can access the memory.. */
> > -           mb();
> 
> We need a full barrier here.
> 
> > +           gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
> >     }
> >  
> > -   /* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
> > -   if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu))
> > -           spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK));
> > -
> 
> Why is this removed?
> 
> -Akhil
> 
> >     gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
> >  
> >     if (adreno_is_a619_holi(adreno_gpu))
> > 
> > ---
> > base-commit: 93a39e4766083050ca0ecd6a3548093a3b9eb60c
> > change-id: 20240508-topic-adreno-a2d199cd4152
> > 
> > Best regards,
> > -- 
> > Konrad Dybcio <konrad.dyb...@linaro.org>
> > 
>

Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

Reply via email to