Avoid a possible UAF in GPU recovery due to a race between the sched timeout callback and the tdr work queue.
The gpu recovery function calls drm_sched_stop() and later drm_sched_start(). drm_sched_start() restarts the tdr queue which will eventually free the job. If the tdr queue frees the job before time out callback completes, the job will be freed and we'll get a UAF when accessing the pasid. Cache it early to avoid the UAF. Cc: [email protected] Cc: [email protected] Cc: [email protected] Suggested-by: Matthew Brost <[email protected]> Signed-off-by: Alex Deucher <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 8a851d7548c00..1adfa23db0981 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -6634,6 +6634,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_hive_info *hive = NULL; int r = 0; bool need_emergency_restart = false; + /* save the pasid here as the job may be freed before the end of the reset */ + unsigned int pasid = job ? job->pasid : 0; /* * If it reaches here because of hang/timeout and a RAS error is @@ -6734,8 +6736,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (!r) { struct amdgpu_task_info *ti = NULL; + /* + * The job may already be freed at this point via the sched tdr workqueue so + * use the cached pasid. + */ if (job) - ti = amdgpu_vm_get_task_info_pasid(adev, job->pasid); + ti = amdgpu_vm_get_task_info_pasid(adev, pasid); drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, ti ? &ti->task : NULL); -- 2.52.0
