On Thu, Jul 6, 2023 at 4:56 AM Chen, Horace <horace.c...@amd.com> wrote: > > [AMD Official Use Only - General] > > Hi Christian & Leo, > > Sorry, missed Leo's email. > > Currently SR-IOV VF doesn't have suspend/resume use case and this code is in > vcn_v4_0_start_sriov() which will only called by SR-IOV code path. So > currently we will not hit the suspend/resume issue even with this patch. > > About why this patch is necessary: > Because SR-IOV uses FLR instead of mode 1 reset as the default reset > method, and FLR will not reset VRAM during the common reset. So if there is > some bad data in the cache, it will cause VF continues to fail after reset. > We have some test case(such as data poison) which may hit this issue. > > If want to be safer in case we may add suspend/resume case in the future: > how about add a "if (amdgpu_in_reset(adev))" check to make sure we only > clear cache in the reset domain?
Yes, that seems like a better bet as that will keep suspend and resume working. Alex > > Thanks & Regards, > Horace. > > -----Original Message----- > From: Koenig, Christian <christian.koe...@amd.com> > Sent: Thursday, July 6, 2023 3:24 PM > To: Liu, Leo <leo....@amd.com>; Chen, Horace <horace.c...@amd.com>; > amd-gfx@lists.freedesktop.org > Cc: Quan, Evan <evan.q...@amd.com>; Deucher, Alexander > <alexander.deuc...@amd.com>; Xiao, Jack <jack.x...@amd.com>; Zhang, Hawking > <hawking.zh...@amd.com>; Liu, Monk <monk....@amd.com>; Xu, Feifei > <feifei...@amd.com>; Chang, HaiJun <haijun.ch...@amd.com> > Subject: Re: [PATCH] drm/amdgpu: Clear VCN cache when hw_init > > Since this patch was already pushed please revert it until we have a > technical consent on why this should be necessary. > > Regards, > Christian. > > Am 05.07.23 um 21:57 schrieb Leo Liu: > > What Christian says is correct, esp. during the playback or encode, > > when suspend/resume happens, it will save the FW context, and after > > resume, it will continue the job to where it left during the suspend. > > Will this apply to SRIOV case? Since the changes only within the SRIOV > > code, please make sure that also please specify the SRIOV from your > > patch subject and commit message. > > > > Regards, > > > > Leo > > > > > > On 2023-06-30 07:38, Christian König wrote: > >> Am 20.06.23 um 15:29 schrieb Horace Chen: > >>> [Why] > >>> VCN will use some framebuffer space as its cache. It needs to be > >>> reset when reset happens, such as FLR. Otherwise some error may be > >>> kept after the reset. > >> > >> Well this doesn't make sense at all. > >> > >> The full content of adev->vcn.inst[i].cpu_addr is saved and restored > >> during suspend/resume and IIRC GPU resets as well. > >> > >> See functions amdgpu_vcn_suspend() and amdgpu_vcn_resume(). > >> > >> Please let Leo's team take a look at this and review the change > >> before it is committed. > >> > >> Regards, > >> Christian. > >> > >>> > >>> Signed-off-by: Horace Chen <horace.c...@amd.com> > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > >>> b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > >>> index b48bb5212488..2db73a964031 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > >>> @@ -1292,6 +1292,7 @@ static int vcn_v4_0_start_sriov(struct > >>> amdgpu_device *adev) > >>> cache_size); > >>> cache_addr = adev->vcn.inst[i].gpu_addr + offset; > >>> + memset(adev->vcn.inst[i].cpu_addr + offset, 0, > >>> AMDGPU_VCN_STACK_SIZE); > >>> MMSCH_V4_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCN, i, > >>> regUVD_LMI_VCPU_CACHE1_64BIT_BAR_LOW), > >>> lower_32_bits(cache_addr)); @@ -1307,6 +1308,8 @@ > >>> static int vcn_v4_0_start_sriov(struct amdgpu_device *adev) > >>> cache_addr = adev->vcn.inst[i].gpu_addr + offset + > >>> AMDGPU_VCN_STACK_SIZE; > >>> + memset(adev->vcn.inst[i].cpu_addr + offset + > >>> AMDGPU_VCN_STACK_SIZE, 0, > >>> + AMDGPU_VCN_STACK_SIZE); > >>> MMSCH_V4_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCN, i, > >>> regUVD_LMI_VCPU_CACHE2_64BIT_BAR_LOW), > >>> lower_32_bits(cache_addr)); > >> >