Re: [PATCH] drm/amdgpu: Enable runtime modification of gpu_recovery parameter with validation

2024-12-29 Thread Christian König
Am 28.12.24 um 07:32 schrieb Shuai Xue: It's observed that most GPU jobs utilize less than one server, typically with each GPU being used by an independent job. If a job consumed poisoned data, a SIGBUS signal will be sent to terminate it. Meanwhile, the gpu_recovery parameter is set to -1 by def

[PATCH] drm/amdgpu: Fix the looply call svm_range_restore_pages issue

2024-12-29 Thread Emily Deng
As the delayed free pt, the wanted freed bo has been reused which will cause unexpected page fault, and then call svm_range_restore_pages. Detail as below: 1.It wants to free the pt in follow code, but it is not freed immediately and used “schedule_work(&vm->pt_free_work);”. [ 92.276838] Call T