On 12/29/25 10:19, Jesse.Zhang wrote:
> In certain error scenarios (e.g., malformed commands), a fence may never 
> become signaled, causing the kernel to hang indefinitely when waiting with 
> MAX_SCHEDULE_TIMEOUT.
> To prevent such hangs and ensure system responsiveness, replace the 
> indefinite timeout with a reasonable 2-second limit.
> 
> This ensures that even if a fence never signals, the wait will time out and 
> appropriate error handling can take place,
> rather than stalling the driver indefinitely.
> 
> Signed-off-by: Jesse Zhang <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index 98110f543307..c28332f98aad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -371,7 +371,7 @@ static int amdgpu_userq_wait_for_last_fence(struct 
> amdgpu_usermode_queue *queue)
>       int ret = 0;
>  
>       if (f && !dma_fence_is_signaled(f)) {
> -             ret = dma_fence_wait_timeout(f, true, MAX_SCHEDULE_TIMEOUT);
> +             ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(2000));

That is clearly a no-go.

That we hang infidelity and never return is a must have behavior or otherwise 
we might run into data-corruption.

Regards,
Christian.

>               if (ret <= 0) {
>                       drm_file_err(uq_mgr->file, "Timed out waiting for 
> fence=%llu:%llu\n",
>                                    f->context, f->seqno);

Reply via email to