[Public]

> -----Original Message-----
> From: Timur Kristóf <[email protected]>
> Sent: Friday, December 19, 2025 11:34 AM
> To: Alex Deucher <[email protected]>
> Cc: [email protected]; Deucher, Alexander
> <[email protected]>
> Subject: Re: [PATCH 6/7] drm/amdgpu/gfx9: rework pipeline sync packet
> sequence
>
> On 2025. december 19., péntek 9:37:16 középső államokbeli zónaidő Alex
> Deucher
> wrote:
> > On Thu, Dec 18, 2025 at 9:36 PM Timur Kristóf
> > <[email protected]>
> wrote:
> > > On 2025. december 18., csütörtök 16:41:40 középső államokbeli
> > > zónaidő Alex
> > >
> > > Deucher wrote:
> > > > Replace WAIT_REG_MEM with EVENT_WRITE flushes for all shader types
> > > > and PFP_SYNC_ME.  That should accomplish the same thing and avoid
> > > > having to wait on a fence preventing any issues with pipeline
> > > > syncs during queue resets.
> > > >
> > > > Signed-off-by: Alex Deucher <[email protected]>
> > > > ---
> > > >
> > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32
> > > > ++++++++++++++++++---------
> > > >  1 file changed, 21 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index
> > > > 7b012ca1153ea..d9dee3c11a05d
> > > > 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > @@ -5572,15 +5572,26 @@ static void
> > > > gfx_v9_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
> > > > amdgpu_ring_write(ring, 0);
> > > >
> > > >  }
> > > >
> > > > -static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring
> > > > *ring)
> > > > +static void gfx_v9_0_ring_emit_event_write(struct amdgpu_ring *ring,
> > > > +                                        uint32_t event_type,
> > > > +                                        uint32_t
> > >
> > > event_index)
> > >
> > > >  {
> > > >
> > > > -     int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
> > > > -     uint32_t seq = ring->fence_drv.sync_seq;
> > > > -     uint64_t addr = ring->fence_drv.gpu_addr;
> > > > +     amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
> > > > +     amdgpu_ring_write(ring, EVENT_TYPE(event_type) |
> > > > +                       EVENT_INDEX(event_index)); }
> > > >
> > > > -     gfx_v9_0_wait_reg_mem(ring, usepfp, 1, 0,
> > > > -                           lower_32_bits(addr),
> > >
> > > upper_32_bits(addr),
> > >
> > > > -                           seq, 0xffffffff, 4);
> > > > +static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring
> > > > +*ring) {
> > > > +     if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) {
> > > > +             gfx_v9_0_ring_emit_event_write(ring,
> > > > +VS_PARTIAL_FLUSH,
> > >
> > > 4);
> > >
> > > Is VS_PARTIAL_FLUSH necessary when we already have
> PS_PARTIAL_FLUSH?
> > > When we wait for all PS to finish, wouldn't that imply that all VS
> > > had already finished as well?
> >
> > I'm not sure.  The CP docs recommend all 3 if you want to wait for the
> > engine to idle.
>
> Alright, it doesn't hurt to have it here.
>
> >
> > > > +             gfx_v9_0_ring_emit_event_write(ring,
> > > > + PS_PARTIAL_FLUSH,
> > >
> > > 4);
> > >
> > > > +             gfx_v9_0_ring_emit_event_write(ring,
> > > > + CS_PARTIAL_FLUSH,
> > >
> > > 4);
> > >
> > > > +             amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME,
> > >
> > > 0));
> > >
> > > > +             amdgpu_ring_write(ring, 0x0);
> > >
> > > The above sequence just waits for all shaders to finish, but as far
> > > as I understand it doesn't wait for memory writes and cache flushes.
> > > Please correct me if I'm wrong about this. For that, I think we do
> > > need an ACQUIRE_MEM packet. (And, if the ACQUIRE_MEM is done on
> the
> > > PFP then we won't need the PFP_SYNC_ME.)
> >
> > There is already a RELEASE_MEM (the fence from the previous job) prior
> > to this packet that would have flushed the caches.  We just want to
> > block the PFP from further fetching until that is complete.  In the
> > good case, the RELEASE_MEM would have handled pipeline idling and
> > cache flushes so these would be effectively noops and in the reset
> > case, we don't care because that bad job is gone anyway.  I guess
> > probably all we really need is the PFP_SYNC_ME.
> >
> > Alex
>
> RELEASE_MEM doesn't wait for the GPU to go idle, RELEASE_MEM just
> promises to write to the given fence address when the specified operations
> (eg. shaders and cache flush) are complete. Here in the pipeline sync, we
> actually want to wait for the GPU to go idle, and AFAIK we need an
> ACQUIRE_MEM for that.

I was thinking the EVENT_WRITE would handle that, but you're right, I don't 
think it handles the caches, only the pipeline, and the fence is asynchronous.  
I'll add the ACQUIRE_MEM.

Alex

>
> >
> > > > +     } else {
> > > > +             gfx_v9_0_ring_emit_event_write(ring, CS_PARTIAL_FLUSH,
> > >
> > > 4);
> > >
> > > > +     }
> > > >
> > > >  }
> > > >
> > > >  static void gfx_v9_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
> > > >
> > > > @@ -7404,7 +7415,7 @@ static const struct amdgpu_ring_funcs
> > > > gfx_v9_0_ring_funcs_gfx = { .set_wptr = gfx_v9_0_ring_set_wptr_gfx,
> > > >
> > > >       .emit_frame_size = /* totally 242 maximum if 16 IBs */
> > > >
> > > >               5 +  /* COND_EXEC */
> > > >
> > > > -             7 +  /* PIPELINE_SYNC */
> > > > +             8 +  /* PIPELINE_SYNC */
> > > >
> > > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > >               2 + /* VM_FLUSH */
> > > >
> > > > @@ -7460,7 +7471,7 @@ static const struct amdgpu_ring_funcs
> > > > gfx_v9_0_sw_ring_funcs_gfx = { .set_wptr =
> amdgpu_sw_ring_set_wptr_gfx,
> > > >
> > > >       .emit_frame_size = /* totally 242 maximum if 16 IBs */
> > > >
> > > >               5 +  /* COND_EXEC */
> > > >
> > > > -             7 +  /* PIPELINE_SYNC */
> > > > +             8 +  /* PIPELINE_SYNC */
> > > >
> > > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > >               2 + /* VM_FLUSH */
> > > >
> > > > @@ -7521,7 +7532,7 @@ static const struct amdgpu_ring_funcs
> > > > gfx_v9_0_ring_funcs_compute = { 20 + /*
> gfx_v9_0_ring_emit_gds_switch */
> > > >
> > > >               7 + /* gfx_v9_0_ring_emit_hdp_flush */
> > > >               5 + /* hdp invalidate */
> > > >
> > > > -             7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > > > +             2 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > > >
> > > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > >               8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user
> > >
> > > fence, vm fence */
> > >
> > > > @@ -7564,7 +7575,6 @@ static const struct amdgpu_ring_funcs
> > > > gfx_v9_0_ring_funcs_kiq = { 20 + /* gfx_v9_0_ring_emit_gds_switch */
> > > >
> > > >               7 + /* gfx_v9_0_ring_emit_hdp_flush */
> > > >               5 + /* hdp invalidate */
> > > >
> > > > -             7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > > >
> > > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > > >               8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user
> > >
> > > fence, vm fence */
>
>
>

Reply via email to