This set improves per queue reset support for GC10+. When we reset the queue, the queue is lost so we need to re-emit the unprocessed state from subsequent submissions. To that end, in order to make sure we actually restore unprocessed state, we need to enable legacy enforce isolation so that we can safely re-emit the unprocessed state. If we don't multiple jobs can run in parallel and we may not end up resetting the correct one. This is similar to how windows handles queues. This also gives us correct guilty tracking for GC.
Tested on GC 10 and 11 chips with a game running and then running hang tests. The game pauses when the hang happens, then continues after the queue reset. I tried this same approach and GC8 and 9, but it was not as reliable as soft recovery. As such, I've dropped the KGQ reset code for pre-GC10. The same approach is extended to SDMA and VCN. They don't need enforce isolation because those engines are single threaded so they always operate serially. Rework re-emit to signal the seq number of the bad job and verify that to verify that the reset worked, then re-emit the rest of the non-guilty state. This way we are not waiting on the rest of the state to complete, and if the subsequent state also contains a bad job, we'll end up in queue reset again rather than adapter reset. v4: Drop explicit padding patches Drop new timeout macro Rework re-emit sequence v5: Add a helper for reemit Convert VCN, JPEG, SDMA to use new helpers Alex Deucher (27): drm/amdgpu: enable legacy enforce isolation by default drm/amdgpu/gfx7: drop reset_kgq drm/amdgpu/gfx8: drop reset_kgq drm/amdgpu/gfx9: drop reset_kgq drm/amdgpu: move force completion into ring resets drm/amdgpu: track ring state associated with a job drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/jpeg2: re-emit unprocessed state on ring reset drm/amdgpu/jpeg2.5: re-emit unprocessed state on ring reset drm/amdgpu/jpeg3: re-emit unprocessed state on ring reset drm/amdgpu/jpeg4: re-emit unprocessed state on ring reset drm/amdgpu/jpeg4.0.3: re-emit unprocessed state on ring reset drm/amdgpu/jpeg5.0.0: add queue reset drm/amdgpu/jpeg5: re-emit unprocessed state on ring reset drm/amdgpu/jpeg5.0.1: re-emit unprocessed state on ring reset drm/amdgpu/vcn4: re-emit unprocessed state on ring reset drm/amdgpu/vcn4.0.3: re-emit unprocessed state on ring reset drm/amdgpu/vcn4.0.5: re-emit unprocessed state on ring reset drm/amdgpu/vcn5: re-emit unprocessed state on ring reset Christian König (1): drm/amdgpu: rework queue reset scheduler interaction drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 12 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 32 +++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 46 ++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 8 +++ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 31 ++-------- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 21 +------ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 21 +------ drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 71 ---------------------- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 71 ---------------------- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 51 +--------------- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 6 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 3 +- drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c | 12 ++++ drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c | 3 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4 ++ drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 7 ++- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 7 ++- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 3 +- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 2 +- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 2 +- 30 files changed, 162 insertions(+), 289 deletions(-) -- 2.49.0