Hi Menzel, Thanks for your advice, will have a try. Best wishes Emily Deng
-----Original Message----- From: Paul Menzel <pmenzel+amd-...@molgen.mpg.de> Sent: Thursday, August 9, 2018 2:47 PM To: Deng, Emily <emily.d...@amd.com> Cc: amd-gfx@lists.freedesktop.org; Liu, Monk <monk....@amd.com> Subject: Re: [PATCH] drm/amdgpu/sriov: give 8s for recover vram under RUNTIME Dear Emily, Thank you for your reply. Am 09.08.2018 um 04:11 schrieb Deng, Emily: > Hi Paul, > Answers as below: It’d be great if you had an mail user agent (email program) that supports quoting. Maybe you could configure Outlook(?) that way? That would tremendously useful for the mailing list archives too. > -----Original Message----- > From: Paul Menzel <pmenzel+amd-...@molgen.mpg.de> > Sent: Wednesday, August 8, 2018 10:29 PM > To: Deng, Emily <emily.d...@amd.com>; amd-gfx@lists.freedesktop.org > Cc: Liu, Monk <monk....@amd.com> > Subject: Re: [PATCH] drm/amdgpu/sriov: give 8s for recover vram under > RUNTIME > On 08/08/18 04:13, Emily Deng wrote: […] >> Extend the timeout for recovering vram bos from shadows on sr-iov to >> cover the worst case scenario for timeslices and VFs >> >> Under runtime, the wait fence time could be quite long when other VFs >> are in exclusive mode. For example, for 4 VF, every VF's exclusive >> timeout time is set to 3s, then the worst case is 9s. If the VF >> number is more than 4,then the worst case time will be longer. > > Nit: Missing space after the comma. > > How did you get to nine seconds? Isn’t it four times three, which is > twelve? > [Emily] It is 3 times three, not 4, as the VF only need to wait the > other three VF's exclusive timeout. Thank you, I didn’t know that. (It looks like you missed my inline comments for the change itself below.) >> The 8s is the test data, with setting to 8s, it will pass the TDR >> test for 1000 times. >> >> SWDEV-161490 >> >> Change-Id: Ifc32d56ca7fde01b1f4fe2b0db6959b51909008a >> Signed-off-by: Monk Liu <monk....@amd.com> >> Signed-off-by: Emily Deng <emily.d...@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 1d933db..ef82ad1 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3124,7 +3124,7 @@ static int amdgpu_device_handle_vram_lost(struct >> amdgpu_device *adev) >> long tmo; >> >> if (amdgpu_sriov_runtime(adev)) >> - tmo = msecs_to_jiffies(amdgpu_lockup_timeout); >> + tmo = msecs_to_jiffies(8000); > > Actually, I do not understand the change at all. Isn’t that a module > parameter? > > ``` > $ git grep amdgpu_lockup_timeout > drivers/gpu/drm/amd/amdgpu/amdgpu.h:extern int amdgpu_lockup_timeout; > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: if (amdgpu_lockup_timeout == > 0) { > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: amdgpu_lockup_timeout > = 10000; > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: tmo = > msecs_to_jiffies(amdgpu_lockup_timeout); > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:int amdgpu_lockup_timeout = > 10000; > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:module_param_named(lockup_timeout, > amdgpu_lockup_timeout, int, 0444); > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c: timeout = > msecs_to_jiffies(amdgpu_lockup_timeout); > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c: if (amdgpu_lockup_timeout == > 0) ``` > > So, if it’s not set, the time-out is set to 10(!) seconds anyway, > isn’t it? What am I missing? > >> else >> tmo = msecs_to_jiffies(100); >> >> > > In my opinion, such time-outs need to have a big FIXME added to it, > and VF’s exclusive time-outs should be drastically decreased. Kind regards, Paul _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx