gfx9: rework pipeline sync packet sequence

Timur Kristóf Fri, 19 Dec 2025 08:34:06 -0800

On 2025. december 19., péntek 9:37:16 középső államokbeli zónaidő Alex Deucher 
wrote:
> On Thu, Dec 18, 2025 at 9:36 PM Timur Kristóf <[email protected]> 
wrote:
> > On 2025. december 18., csütörtök 16:41:40 középső államokbeli zónaidő Alex
> > 
> > Deucher wrote:
> > > Replace WAIT_REG_MEM with EVENT_WRITE flushes for all
> > > shader types and PFP_SYNC_ME.  That should accomplish
> > > the same thing and avoid having to wait on a fence
> > > preventing any issues with pipeline syncs during
> > > queue resets.
> > > 
> > > Signed-off-by: Alex Deucher <[email protected]>
> > > ---
> > > 
> > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32 ++++++++++++++++++---------
> > >  1 file changed, 21 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index
> > > 7b012ca1153ea..d9dee3c11a05d
> > > 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > @@ -5572,15 +5572,26 @@ static void gfx_v9_0_ring_emit_fence(struct
> > > amdgpu_ring *ring, u64 addr, amdgpu_ring_write(ring, 0);
> > > 
> > >  }
> > > 
> > > -static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
> > > +static void gfx_v9_0_ring_emit_event_write(struct amdgpu_ring *ring,
> > > +                                        uint32_t event_type,
> > > +                                        uint32_t
> > 
> > event_index)
> > 
> > >  {
> > > 
> > > -     int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
> > > -     uint32_t seq = ring->fence_drv.sync_seq;
> > > -     uint64_t addr = ring->fence_drv.gpu_addr;
> > > +     amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
> > > +     amdgpu_ring_write(ring, EVENT_TYPE(event_type) |
> > > +                       EVENT_INDEX(event_index));
> > > +}
> > > 
> > > -     gfx_v9_0_wait_reg_mem(ring, usepfp, 1, 0,
> > > -                           lower_32_bits(addr),
> > 
> > upper_32_bits(addr),
> > 
> > > -                           seq, 0xffffffff, 4);
> > > +static void gfx_v9_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
> > > +{
> > > +     if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) {
> > > +             gfx_v9_0_ring_emit_event_write(ring, VS_PARTIAL_FLUSH,
> > 
> > 4);
> > 
> > Is VS_PARTIAL_FLUSH necessary when we already have PS_PARTIAL_FLUSH?
> > When we wait for all PS to finish, wouldn't that imply that all VS had
> > already finished as well?
> 
> I'm not sure.  The CP docs recommend all 3 if you want to wait for the
> engine to idle.


Alright, it doesn't hurt to have it here.

> 
> > > +             gfx_v9_0_ring_emit_event_write(ring, PS_PARTIAL_FLUSH,
> > 
> > 4);
> > 
> > > +             gfx_v9_0_ring_emit_event_write(ring, CS_PARTIAL_FLUSH,
> > 
> > 4);
> > 
> > > +             amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME,
> > 
> > 0));
> > 
> > > +             amdgpu_ring_write(ring, 0x0);
> > 
> > The above sequence just waits for all shaders to finish, but as far as I
> > understand it doesn't wait for memory writes and cache flushes. Please
> > correct me if I'm wrong about this. For that, I think we do need an
> > ACQUIRE_MEM packet. (And, if the ACQUIRE_MEM is done on the PFP then we
> > won't need the PFP_SYNC_ME.)
> 
> There is already a RELEASE_MEM (the fence from the previous job) prior
> to this packet that would have flushed the caches.  We just want to
> block the PFP from further fetching until that is complete.  In the
> good case, the RELEASE_MEM would have handled pipeline idling and
> cache flushes so these would be effectively noops and in the reset
> case, we don't care because that bad job is gone anyway.  I guess
> probably all we really need is the PFP_SYNC_ME.
> 
> Alex

RELEASE_MEM doesn't wait for the GPU to go idle, RELEASE_MEM just promises to 
write to the given fence address when the specified operations (eg. shaders and 
cache flush) are complete. Here in the pipeline sync, we actually want to wait 
for the GPU to go idle, and AFAIK we need an ACQUIRE_MEM for that.

> 
> > > +     } else {
> > > +             gfx_v9_0_ring_emit_event_write(ring, CS_PARTIAL_FLUSH,
> > 
> > 4);
> > 
> > > +     }
> > > 
> > >  }
> > >  
> > >  static void gfx_v9_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
> > > 
> > > @@ -7404,7 +7415,7 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_ring_funcs_gfx = { .set_wptr = gfx_v9_0_ring_set_wptr_gfx,
> > > 
> > >       .emit_frame_size = /* totally 242 maximum if 16 IBs */
> > >       
> > >               5 +  /* COND_EXEC */
> > > 
> > > -             7 +  /* PIPELINE_SYNC */
> > > +             8 +  /* PIPELINE_SYNC */
> > > 
> > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > >               2 + /* VM_FLUSH */
> > > 
> > > @@ -7460,7 +7471,7 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_sw_ring_funcs_gfx = { .set_wptr = amdgpu_sw_ring_set_wptr_gfx,
> > > 
> > >       .emit_frame_size = /* totally 242 maximum if 16 IBs */
> > >       
> > >               5 +  /* COND_EXEC */
> > > 
> > > -             7 +  /* PIPELINE_SYNC */
> > > +             8 +  /* PIPELINE_SYNC */
> > > 
> > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > >               2 + /* VM_FLUSH */
> > > 
> > > @@ -7521,7 +7532,7 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_ring_funcs_compute = { 20 + /* gfx_v9_0_ring_emit_gds_switch */
> > > 
> > >               7 + /* gfx_v9_0_ring_emit_hdp_flush */
> > >               5 + /* hdp invalidate */
> > > 
> > > -             7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > > +             2 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > > 
> > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > >               8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user
> > 
> > fence, vm fence */
> > 
> > > @@ -7564,7 +7575,6 @@ static const struct amdgpu_ring_funcs
> > > gfx_v9_0_ring_funcs_kiq = { 20 + /* gfx_v9_0_ring_emit_gds_switch */
> > > 
> > >               7 + /* gfx_v9_0_ring_emit_hdp_flush */
> > >               5 + /* hdp invalidate */
> > > 
> > > -             7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> > > 
> > >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > >               8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user
> > 
> > fence, vm fence */

Re: [PATCH 6/7] drm/amdgpu/gfx9: rework pipeline sync packet sequence

Reply via email to