On Fri, May 23, 2025 at 9:27 AM Christian König <christian.koe...@amd.com> wrote: > > On 5/23/25 05:04, Alex Deucher wrote: > > On Thu, May 22, 2025 at 5:57 PM Alex Deucher <alexander.deuc...@amd.com> > > wrote: > >> > >> This set improves per queue reset support for GC10+. > >> This uses vmid resets for GFX. GFX resets all state > >> associated with a vmid and then continues where it > >> left off. Since once the IB uses the vmid, only > >> the IB is reset and execution continues after the IB. > >> Tested on GC 10 and 11 chips with a game running and > >> then running hang tests. The game pauses when the > >> hang happens, then continues after the queue reset. > > > > After further investigation, this appears to work as expected, but > > only by chance. The ring is reset, but any pipelined content in the > > ring after the job is lost. We either need to limit the ring to one > > job or patch in the subsequent packets after resetting. > > Yeah, I feared that this wouldn't work. > > Any idea why the VMID based reset isn't working?
I think it works similarly to the preemption sequence. E.g., see gfx_v9_0_ring_preempt_ib(), but with a reset rather than a preemption, but I don't think this will be easily portable to gfx11 and newer as they no longer have direct access to the HWS. > > On the other hand we could just restart from the ring RPTR again. I think that's probably the best option. I was thinking we could mirror the ring frames for each gang and after a reset, we submit the unprocessed frames again. That way we can still do a ring test to make sure the ring is functional after the reset and then submit the unprocessed work. Alex > > Regards, > Christian. > > > > > Alex > > > >> > >> I tried this same approach and GC8 and 9, but it > >> was not as reliable as soft recovery. I also compared > >> this to Christian's reset patches, but I was not > >> able to make them work as reliably as this series. > >> > >> Alex Deucher (9): > >> Revert "drm/amd/amdgpu: add pipe1 hardware support" > >> drm/amdgpu: add AMDGPU_QUEUE_RESET_TIMEOUT > >> drm/amdgpu: set the exec flag on the IB fence > >> drm/amdgpu/gfx11: adjust ring reset sequences > >> drm/amdgpu/gfx11: drop soft recovery > >> drm/amdgpu/gfx12: adjust ring reset sequences > >> drm/amdgpu/gfx12: drop soft recovery > >> drm/amdgpu/gfx10: adjust ring reset sequences > >> drm/amdgpu/gfx10: drop soft recovery > >> > >> Christian König (1): > >> drm/amdgpu: rework queue reset scheduler interaction > >> > >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > >> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 +- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 26 ++++++++-------- > >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 41 ++++++++----------------- > >> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 35 ++++++--------------- > >> drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 35 ++++++--------------- > >> drivers/gpu/drm/amd/amdgpu/nvd.h | 1 + > >> 7 files changed, 50 insertions(+), 92 deletions(-) > >> > >> -- > >> 2.49.0 > >> >