[AMD Official Use Only - AMD Internal Distribution Only]

Hi, Christian.

what I mean is:
in sriov mode, when customer need limit timeout value ,
they should set the "lockup_timeout" according to the vf number they load.


Why:

The real timeout value in sriov for each vf is " lockup_timeout / vf_number",

As you said they want to limit the timeout to 2s,
when customer load 8vf, they should set the "lockup_timeout" to 16s,  4vf need 
set "lockup_timeout" to 8s.


(After we test, when value "lockup_timeout" set to 2s, the 4vf mode can't work 
as each vf only get 0.5s)





Thanks,
Chong.



-----Original Message-----
From: Koenig, Christian <[email protected]>
Sent: Tuesday, November 18, 2025 5:31 PM
To: Li, Chong(Alan) <[email protected]>; [email protected]
Cc: Chen, Horace <[email protected]>
Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds timeout 
is not enough for sdma job

Hi Chong,

that is not a valid justification.

What customer requirement is causing this SDMA timeout?

When it is just your CI system then change the CI system.

As long as you can't come up with a customer requirement this change is 
rejected.

Regards,
Christian.

On 11/18/25 10:29, Li, Chong(Alan) wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi, Christian.
>
> In multiple vf mode( in our CI environment the vf number is 4), the timeout 
> value is shared across all vfs.
> After timeout value change to 2s, each vf only get 0.5s, cause sdma ring 
> timeout and trigger gpu reset.
>
>
> Thanks,
> Chong.
>
> -----Original Message-----
> From: Koenig, Christian <[email protected]>
> Sent: Tuesday, November 18, 2025 4:34 PM
> To: Li, Chong(Alan) <[email protected]>; [email protected]
> Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds timeout 
> is not enough for sdma job
>
> Clear NAK to this patch.
>
> It is explicitely requested by customers that we only have a 2 second timeout.
>
> So you need a very good explanation to have that changed for SRIOV.
>
> Regards,
> Christian.
>
> On 11/17/25 07:53, chong li wrote:
>> Signed-off-by: chong li <[email protected]>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++--
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 4 ++--
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 69c29f47212d..4ab755eb5ec1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4341,10 +4341,15 @@ static int 
>> amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>>       int index = 0;
>>       long timeout;
>>       int ret = 0;
>> +     long timeout_default;
>>
>> -     /* By default timeout for all queues is 2 sec */
>> +     if (amdgpu_sriov_vf(adev))
>> +             timeout_default = msecs_to_jiffies(10000);
>> +     else
>> +             timeout_default = msecs_to_jiffies(2000);
>> +     /* By default timeout for all queues is 10 sec in sriov, 2 sec not in 
>> sriov*/
>>       adev->gfx_timeout = adev->compute_timeout = adev->sdma_timeout =
>> -             adev->video_timeout = msecs_to_jiffies(2000);
>> +             adev->video_timeout = timeout_default;
>>
>>       if (!strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH))
>>               return 0;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index f508c1a9fa2c..43bdd6c1bec2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -358,10 +358,10 @@ module_param_named(svm_default_granularity, 
>> amdgpu_svm_default_granularity, uint
>>   * [GFX,Compute,SDMA,Video] to set individual timeouts.
>>   * Negative values mean infinity.
>>   *
>> - * By default(with no lockup_timeout settings), the timeout for all queues 
>> is 2000.
>> + * By default(with no lockup_timeout settings), the timeout for all queues 
>> is 10000 in sriov, 2000 not in sriov.
>>   */
>>  MODULE_PARM_DESC(lockup_timeout,
>> -              "GPU lockup timeout in ms (default: 2000. 0: keep default 
>> value. negative: infinity timeout), format: [single value for all] or 
>> [GFX,Compute,SDMA,Video].");
>> +              "GPU lockup timeout in ms (default: 10000 in sriov, 2000 not 
>> in sriov. 0: keep default value. negative: infinity timeout), format: 
>> [single value for all] or [GFX,Compute,SDMA,Video].");
>>  module_param_string(lockup_timeout, amdgpu_lockup_timeout,
>>                   sizeof(amdgpu_lockup_timeout), 0444);
>>
>

Reply via email to