On Tue, Nov 18, 2025 at 5:49 AM Christian König <[email protected]> wrote: > > Hi Chong, > > yeah and exactly that argumentation is not correct. > > We have to guarantee a min minimum response time and that is what the timeout > is all about. > > And it doesn't matter if the available HW time is split between 1,2,4 or 8 > virtual functions. The minimum reponse time we need to guarantee is still the > same, it's just that the available HW time is lowered. > > So as long as we don't have an explicit customer request which asks for > longer default timeouts this change is rejected.
I think the change makes sense. It needs to be longer to compensate for the world switch latency. 0.5 seconds of runtime is probably too short for many larger workloads. Alex > > Regards, > Christian. > > On 11/18/25 11:08, Li, Chong(Alan) wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > Hi, Christian. > > > > what I mean is: > > in sriov mode, when customer need limit timeout value , > > they should set the "lockup_timeout" according to the vf number they load. > > > > > > Why: > > > > The real timeout value in sriov for each vf is " lockup_timeout / > > vf_number", > > > > As you said they want to limit the timeout to 2s, > > when customer load 8vf, they should set the "lockup_timeout" to 16s, 4vf > > need set "lockup_timeout" to 8s. > > > > > > (After we test, when value "lockup_timeout" set to 2s, the 4vf mode can't > > work as each vf only get 0.5s) > > > > > > > > > > > > Thanks, > > Chong. > > > > > > > > -----Original Message----- > > From: Koenig, Christian <[email protected]> > > Sent: Tuesday, November 18, 2025 5:31 PM > > To: Li, Chong(Alan) <[email protected]>; [email protected] > > Cc: Chen, Horace <[email protected]> > > Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds > > timeout is not enough for sdma job > > > > Hi Chong, > > > > that is not a valid justification. > > > > What customer requirement is causing this SDMA timeout? > > > > When it is just your CI system then change the CI system. > > > > As long as you can't come up with a customer requirement this change is > > rejected. > > > > Regards, > > Christian. > > > > On 11/18/25 10:29, Li, Chong(Alan) wrote: > >> [AMD Official Use Only - AMD Internal Distribution Only] > >> > >> Hi, Christian. > >> > >> In multiple vf mode( in our CI environment the vf number is 4), the > >> timeout value is shared across all vfs. > >> After timeout value change to 2s, each vf only get 0.5s, cause sdma ring > >> timeout and trigger gpu reset. > >> > >> > >> Thanks, > >> Chong. > >> > >> -----Original Message----- > >> From: Koenig, Christian <[email protected]> > >> Sent: Tuesday, November 18, 2025 4:34 PM > >> To: Li, Chong(Alan) <[email protected]>; [email protected] > >> Subject: Re: [PATCH] drm/amdgpu: in sriov multiple vf mode, 2 seconds > >> timeout is not enough for sdma job > >> > >> Clear NAK to this patch. > >> > >> It is explicitely requested by customers that we only have a 2 second > >> timeout. > >> > >> So you need a very good explanation to have that changed for SRIOV. > >> > >> Regards, > >> Christian. > >> > >> On 11/17/25 07:53, chong li wrote: > >>> Signed-off-by: chong li <[email protected]> > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++-- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++-- > >>> 2 files changed, 9 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> index 69c29f47212d..4ab755eb5ec1 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> @@ -4341,10 +4341,15 @@ static int > >>> amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) > >>> int index = 0; > >>> long timeout; > >>> int ret = 0; > >>> + long timeout_default; > >>> > >>> - /* By default timeout for all queues is 2 sec */ > >>> + if (amdgpu_sriov_vf(adev)) > >>> + timeout_default = msecs_to_jiffies(10000); > >>> + else > >>> + timeout_default = msecs_to_jiffies(2000); > >>> + /* By default timeout for all queues is 10 sec in sriov, 2 sec not > >>> in sriov*/ > >>> adev->gfx_timeout = adev->compute_timeout = adev->sdma_timeout = > >>> - adev->video_timeout = msecs_to_jiffies(2000); > >>> + adev->video_timeout = timeout_default; > >>> > >>> if (!strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) > >>> return 0; > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> index f508c1a9fa2c..43bdd6c1bec2 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> @@ -358,10 +358,10 @@ module_param_named(svm_default_granularity, > >>> amdgpu_svm_default_granularity, uint > >>> * [GFX,Compute,SDMA,Video] to set individual timeouts. > >>> * Negative values mean infinity. > >>> * > >>> - * By default(with no lockup_timeout settings), the timeout for all > >>> queues is 2000. > >>> + * By default(with no lockup_timeout settings), the timeout for all > >>> queues is 10000 in sriov, 2000 not in sriov. > >>> */ > >>> MODULE_PARM_DESC(lockup_timeout, > >>> - "GPU lockup timeout in ms (default: 2000. 0: keep default > >>> value. negative: infinity timeout), format: [single value for all] or > >>> [GFX,Compute,SDMA,Video]."); > >>> + "GPU lockup timeout in ms (default: 10000 in sriov, 2000 > >>> not in sriov. 0: keep default value. negative: infinity timeout), format: > >>> [single value for all] or [GFX,Compute,SDMA,Video]."); > >>> module_param_string(lockup_timeout, amdgpu_lockup_timeout, > >>> sizeof(amdgpu_lockup_timeout), 0444); > >>> > >> > > >
