gfx9: Implement KGQ ring reset

Alex Deucher Thu, 18 Dec 2025 11:02:33 -0800

On Thu, Dec 18, 2025 at 12:03 PM Timur Kristóf <[email protected]> wrote:
>
> On 2025. december 18., csütörtök 9:58:45 középső államokbeli zónaidő Alex
> Deucher wrote:
> > On Thu, Dec 18, 2025 at 12:21 AM Timur Kristóf <[email protected]>
> wrote:
> > > On 2025. december 15., hétfő 10:07:11 középső államokbeli zónaidő Alex
> > > Deucher>
> > > wrote:
> > > > GFX ring resets work differently on pre-GFX10 hardware since
> > > > there is no MQD managed by the scheduler.
> > > > For ring reset, you need issue the reset via CP_VMID_RESET
> > > > via KIQ or MMIO and submit the following to the gfx ring to
> > > > complete the reset:
> > > > 1. EOP packet with EXEC bit set
> > > > 2. WAIT_REG_MEM to wait for the fence
> > > > 3. Clear CP_VMID_RESET to 0
> > > > 4. EVENT_WRITE ENABLE_LEGACY_PIPELINE
> > > > 5. EOP packet with EXEC bit set
> > > > 6. WAIT_REG_MEM to wait for the fence
> > > > Once those commands have completed the reset should
> > > > be complete and the ring can accept new packets.
> > > >
> > > > However, because we have a pipeline sync between jobs,
> > > > the PFP is waiting on the fence from the bad job to signal so
> > > > it can't process any of the packets in the reset sequence
> > > > until that pipeline sync clears.  To unblock the PFP, we
> > > > use the KIQ to signal the fence after we reset the queue.
> > > >
> > > > Signed-off-by: Alex Deucher <[email protected]>
> > > > ---
> > > >
> > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 104 +++++++++++++++++++++++++-
> > > >  1 file changed, 101 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index
> > > > bb1465a98c7ca..9b7073650315e
> > > > 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > @@ -2410,8 +2410,10 @@ static int gfx_v9_0_sw_init(struct
> > > > amdgpu_ip_block
> > > > *ip_block) amdgpu_get_soft_full_reset_mask(&adev->gfx.gfx_ring[0]);
> > > >
> > > >       adev->gfx.compute_supported_reset =
> > > >
> > > >               amdgpu_get_soft_full_reset_mask(&adev-
> > > >
> > > >gfx.compute_ring[0]);
> > > >
> > > > -     if (!amdgpu_sriov_vf(adev) && !adev->debug_disable_gpu_ring_reset)
> > > > +     if (!amdgpu_sriov_vf(adev) && !adev->debug_disable_gpu_ring_reset)
> > >
> > > {
> > >
> > > >               adev->gfx.compute_supported_reset |=
> > >
> > > AMDGPU_RESET_TYPE_PER_QUEUE;
> > >
> > > > +             adev->gfx.gfx_supported_reset |=
> > >
> > > AMDGPU_RESET_TYPE_PER_QUEUE;
> > >
> > > > +     }
> > > >
> > > >       r = amdgpu_gfx_kiq_init(adev, GFX9_MEC_HPD_SIZE, 0);
> > > >       if (r) {
> > > >
> > > > @@ -7163,6 +7165,103 @@ static void gfx_v9_ring_insert_nop(struct
> > > > amdgpu_ring *ring, uint32_t num_nop) amdgpu_ring_insert_nop(ring,
> > > > num_nop -
> > > > 1);
> > > >
> > > >  }
> > > >
> > > > +static void gfx_v9_0_ring_emit_wreg_me(struct amdgpu_ring *ring,
> > > > +                                    uint32_t reg,
> > > > +                                    uint32_t val)
> > > > +{
> > > > +     uint32_t cmd = 0;
> > > > +
> > > > +     switch (ring->funcs->type) {
> > > > +     case AMDGPU_RING_TYPE_KIQ:
> > > > +             cmd = (1 << 16); /* no inc addr */
> > >
> > > What do you mean by "inc addr" in this context?
> >
> > It's part of the packet.  bit 16 controls whether the address is
> > incremented or not.  This function is basically the same as
> > gfx_v9_0_ring_emit_wreg(), but uses the ME to do the wait rather than
> > the PFP.  I could have alternatively added a new parameter to
> > gfx_v9_0_ring_emit_wreg() to select between PFP and ME.
>
> Thanks. I was just not familiar with this field of the packet.
>
> >
> > > > +             break;
> > > > +     default:
> > > > +             cmd = WR_CONFIRM;
> > > > +             break;
> > > > +     }
> > > > +     amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
> > > > +     amdgpu_ring_write(ring, cmd);
> > > > +     amdgpu_ring_write(ring, reg);
> > > > +     amdgpu_ring_write(ring, 0);
> > > > +     amdgpu_ring_write(ring, val);
> > > > +}
> > > > +
> > > > +static void gfx_v9_0_ring_emit_event_write(struct amdgpu_ring *ring,
> > > > +                                        uint32_t event_type,
> > > > +                                        uint32_t
> > >
> > > event_index)
> > >
> > > > +{
> > > > +     amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
> > > > +     amdgpu_ring_write(ring, EVENT_TYPE(event_type) |
> > > > +                       EVENT_INDEX(event_index));
> > > > +}
> > > > +
> > > > +static int gfx_v9_0_reset_kgq(struct amdgpu_ring *ring,
> > > > +                           unsigned int vmid,
> > > > +                           struct amdgpu_fence *timedout_fence)
> > > > +{
> > > > +     struct amdgpu_device *adev = ring->adev;
> > > > +     struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
> > > > +     struct amdgpu_ring *kiq_ring = &kiq->ring;
> > > > +     unsigned long flags;
> > > > +     u32 tmp;
> > > > +     int r;
> > > > +
> > > > +     amdgpu_ring_reset_helper_begin(ring, timedout_fence);
> > > > +
> > > > +     spin_lock_irqsave(&kiq->ring_lock, flags);
> > > > +
> > > > +     if (amdgpu_ring_alloc(kiq_ring, 5 + 5)) {
> > > > +             spin_unlock_irqrestore(&kiq->ring_lock, flags);
> > > > +             return -ENOMEM;
> > > > +     }
> > > > +
> > > > +     /* send the reset - 5 */
> > > > +     tmp = REG_SET_FIELD(0, CP_VMID_RESET, RESET_REQUEST, 1 << vmid);
> > > > +     gfx_v9_0_ring_emit_wreg(kiq_ring,
> > > > +                             SOC15_REG_OFFSET(GC, 0,
> > >
> > > mmCP_VMID_RESET), tmp);
> > >
> > > > +     /* emit the fence to clear the pipeline sync - 5 */
> > > > +     gfx_v9_0_ring_emit_fence_kiq(kiq_ring, ring->fence_drv.gpu_addr,
> > > > +                                  timedout_fence->base.seqno,
> > >
> > > 0);
> > >
> > > As far as I see, this isn't going to work when sched_hw_submission > 2 and
> > > there are more than two jobs (from various different userspace processes)
> > > emitted in the ring.
> > >
> > > I can think of two possible solutons:
> > > - Emit each fence individually, with a short delay in between to give a
> > > chance to the GFX ring to catch up with the KIQ.
> > > - Change the wait_reg_mem command used for the pipeline sync to allow
> > > greater than equal instead of just equal. Then it's enough to signal just
> > > the last fence on the KIQ ring.
> >
> > That won't work.  The signalling patch is asynchronous so if we signal
> > additional fences other than the bad one, those jobs will end up being
> > seen as successfully completed.
>
> Yeah, I already realized that after I sent the email, and sent a follow-up
> email with a different suggestion, that is to change the pipeline sync to not
> rely on the fence of the previous submission. eg. the pipeline sync could emit
> a separate fence, or even simpler, an ACQUIRE_MEM (gfx9+) / SURFACE_SYNC
> (gfx6-8). Then this becomes a non-issue.
>
> What do you think about that?


We could try it.  We need to verify that the ACQUIRE_MEM semantics
match what we need.
I think the following would also work and not require a WAIT_REG_MEM:

EVENT_WRITE(VS done)
EVENT_WRITE(PS done)
EVENT_WRITE(CS done)
PFP_SYNC_ME

I'll give it a try.

>
> > That said, there is a change coming
> > for the firmware to fix this.
>
> That's very nice to hear that AMD is still willing to fix firmware for old 
> chips
> like Vega. However,
>
> - Would be nice to solve the problem for users who are still running old
> firmware, as it doesn't seem too difficult to do so.
> - AFAIK gfx7-8 also use the same queue reset mechanism so I think they may be
> susceptible to the same issue (and I don't think anyone's gonna release new FW
> for those).
>
> > I'd suggest we just limit the queue
> > depth to 2 until the new firmware is available.
>
> Last I heard of it, the SteamOS kernel set it to 4 due to "CPU bubbles" that
> they observed with the default setting of 2 on the Steam Deck. I think on the
> long term we should seriously consider increasing the default in upstream as
> well.

Ok.

>
> >
> > > > +     amdgpu_ring_commit(kiq_ring);
> > > > +     r = amdgpu_ring_test_ring(kiq_ring);
> > > > +     spin_unlock_irqrestore(&kiq->ring_lock, flags);
> > > > +     if (r)
> > > > +             return r;
> > > > +
> > > > +     if (amdgpu_ring_alloc(ring, 8 + 7 + 5 + 2 + 8 + 7))
> > > > +             return -ENOMEM;
> > > > +     /* emit the fence to finish the reset - 8 */
> > > > +     ring->trail_seq++;
> > > > +     gfx_v9_0_ring_emit_fence(ring, ring->trail_fence_gpu_addr,
> > > > +                              ring->trail_seq,
> > >
> > > AMDGPU_FENCE_FLAG_EXEC);
> > >
> > > > +     /* wait for the fence - 7 */
> > > > +     gfx_v9_0_wait_reg_mem(ring, 0, 1, 0,
> > > > +                           lower_32_bits(ring-
> > > >
> > > >trail_fence_gpu_addr),
> > > >
> > > > +                           upper_32_bits(ring-
> > > >
> > > >trail_fence_gpu_addr),
> > > >
> > > > +                           ring->trail_seq, 0xffffffff, 4);
> > > > +     /* clear mmCP_VMID_RESET - 5 */
> > > > +     gfx_v9_0_ring_emit_wreg_me(ring,
> > > > +                                SOC15_REG_OFFSET(GC, 0,
> > >
> > > mmCP_VMID_RESET), 0);
> > >
> > > > +     /* event write ENABLE_LEGACY_PIPELINE - 2 */
> > > > +     gfx_v9_0_ring_emit_event_write(ring, ENABLE_LEGACY_PIPELINE, 0);
> > > > +     /* emit a regular fence - 8 */
> > > > +     ring->trail_seq++;
> > > > +     gfx_v9_0_ring_emit_fence(ring, ring->trail_fence_gpu_addr,
> > > > +                              ring->trail_seq,
> > >
> > > AMDGPU_FENCE_FLAG_EXEC);
> > >
> > > > +     /* wait for the fence - 7 */
> > > > +     gfx_v9_0_wait_reg_mem(ring, 1, 1, 0,
> > > > +                           lower_32_bits(ring-
> > > >
> > > >trail_fence_gpu_addr),
> > > >
> > > > +                           upper_32_bits(ring-
> > > >
> > > >trail_fence_gpu_addr),
> > > >
> > > > +                           ring->trail_seq, 0xffffffff, 4);
> > >
> > > Why is it necessary to emit (and wait for) a regular fence here?
> > > I'm not against it, just curious why it's needed.
> >
> > It's part of the recovery sequence to make sure the
> > ENABLE_LEGACY_PIPELINE event has completed successfully.
>
> Ah, okay. I didn't realize that we needed fences after EVENT_WRITE.
>
> >
> > > > +     amdgpu_ring_commit(ring);
> > > > +     /* wait for the commands to complete */
> > > > +     r = amdgpu_ring_test_ring(ring);
> > > > +     if (r)
> > > > +             return r;
> > > > +
> > > > +     return amdgpu_ring_reset_helper_end(ring, timedout_fence);
> > > > +}
> > > > +
> > > >
> > > >  static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring,
> > > >
> > > >                             unsigned int vmid,
> > > >                             struct amdgpu_fence *timedout_fence)
> > > >
> > > > @@ -7441,9 +7540,9 @@ static const struct amdgpu_ring_funcs
> > > > gfx_v9_0_ring_funcs_gfx = { .emit_wreg = gfx_v9_0_ring_emit_wreg,
> > > >
> > > >       .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
> > > >       .emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait,
> > > >
> > > > -     .soft_recovery = gfx_v9_0_ring_soft_recovery,
> > >
> > > Can you please split removing the soft recovery into a separate patch?
> > >
> > > Can we talk about removing the soft recovery? For the other chips where it
> > > has already been removed, it is percieved by users as a regression.
> >
> > Queue reset is superset of soft recovery.  There's no need for soft
> > recovery when queue reset is available.
> >
>
> I'm not arguing for or against either approach, just relaying how the change
> is percieved by users who tell their experience.
>
> From a user's perspective:
> - queue reset just kills the entire job and the offending process, ie. the 
> user
> sees the game crash
> - soft recovery sometimes allows the guilty process to move on intact, ie. the
> user would see a "hitch" in the game but then it would keep working
>

Ah, I see what you mean.  We do set the error on the fence in that
case, but I guess mesa treats it differently (-ENODATA vs. -ETIME).
We can think about bringing it back assuming it doesn't negatively
impact the per queue resets.

Alex

>
> >
> > > >       .emit_mem_sync = gfx_v9_0_emit_mem_sync,
> > > >       .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
> > > >
> > > > +     .reset = gfx_v9_0_reset_kgq,
> > > >
> > > >       .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
> > > >       .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
> > > >
> > > >  };
> > > >
> > > > @@ -7542,7 +7641,6 @@ static const struct amdgpu_ring_funcs
> > > > gfx_v9_0_ring_funcs_compute = { .emit_wreg = gfx_v9_0_ring_emit_wreg,
> > > >
> > > >       .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
> > > >       .emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait,
> > > >
> > > > -     .soft_recovery = gfx_v9_0_ring_soft_recovery,
> > > >
> > > >       .emit_mem_sync = gfx_v9_0_emit_mem_sync,
> > > >       .emit_wave_limit = gfx_v9_0_emit_wave_limit,
> > > >       .reset = gfx_v9_0_reset_kcq,
>
>
>
>

Re: [PATCH 6/6] drm/amdgpu/gfx9: Implement KGQ ring reset

Reply via email to