Re: [PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-28 Thread Mika Laitio
ACO reset [ 1062.937902] amdgpu: kgd2kfd_quiesce_mm called by svm_range_evict [ 1062.937907] amdgpu: evict_process_queues_cpsch started On Wed, Nov 27, 2024 at 3:50 PM Felix Kuehling wrote: > > On 2024-11-27 06:51, Christian König wrote: > > Am 27.11.24 um 12:46 schrieb Mika L

[PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-27 Thread Mika Laitio
] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue [ 960.785816] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx Signed-off-by: Mika Laitio --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 24 --- 1 file changed, 16 insertions(+), 8

[PATCH 0/1] amdgpu fix for gfx1103 queue evict/restore crash v2

2024-11-27 Thread Mika Laitio
orch.float32, device=device) print("[" + str(ii) + "], the crash happens here:") time.sleep(0.5) result = model_run(tensor).numpy(force=True) print(result.shape) Mika Laitio (1): amdgpu fix for gfx1103 queue evict/restore crash .../drm/amd/a

[PATCH] ammdgpu fix for gfx1103 queue evict/restore crash

2024-11-21 Thread Mika Laitio
Signed-off-by: Mika Laitio --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 648f40091aa3