Looks good. Thanks for fixing that. I've sent the patches out for review!
Alex On Wed, Apr 27, 2022 at 9:37 PM Wong, Alice <[email protected]> wrote: > > [AMD Official Use Only - General] > > Need this patch on top of your patches to handle psp_load_fw failure. > > Alice > > -----Original Message----- > From: Alex Deucher <[email protected]> > Sent: April 22, 2022 5:23 PM > To: Wong, Alice <[email protected]> > Cc: amd-gfx list <[email protected]> > Subject: Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed v2 > > How about these patches? > > Alex > > On Fri, Apr 22, 2022 at 5:00 PM Alex Deucher <[email protected]> wrote: > > > > On Fri, Apr 22, 2022 at 3:54 PM Wong, Alice <[email protected]> wrote: > > > > > > [AMD Official Use Only] > > > > > > Hi Alex, > > > > > > The attached patch freed most of the allocated memory except for one > > > allocated by psp_tmr_init during psp_load_fw. > > > Combination of the attached patch and calling psp_hw_fini when > > > psp_hw_init failed would fix the issue. > > > > > > As a proper fix, we can call psp_tmr_terminate in psp_load_fw when > > > psp_load_non_psp_fw failed. (attached patch) We can't move psp_tmr_init > > > to sw_init because we need to load toc to get the TMR size. > > > Do you have any concerns with this approach? > > > > That only covers failures in hw_init(). It doesn't handle resume(). > > Looks like all of the ta functions also potentially leak. I'm working > > on a cleanup to handle all of these. > > > > Alex > > > > > > > > Alice > > > > > > > > > -----Original Message----- > > > From: Alex Deucher <[email protected]> > > > Sent: April 21, 2022 1:31 AM > > > To: Wong, Alice <[email protected]> > > > Cc: amd-gfx list <[email protected]> > > > Subject: Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed > > > v2 > > > > > > On Wed, Apr 20, 2022 at 5:48 PM Alice Wong <[email protected]> wrote: > > > > > > > > If at any point psp_hw_init failed, psp_hw_fini would not be > > > > called during unload due to ip_blocks[PSP].status.hw not being set to > > > > true. > > > > This could cause a memory leak when the driver unloads. > > > > As a rule of thumb, each IP block should cleanup themselves when > > > > their hw_init fails. Only previously intialized blocks should be > > > > cleaned up by the common framework. > > > > > > > > v1: Call IP blocks' respective hw_fini when hw_init failed from > > > > the common framework > > > > v2: Call psp_hw_fini when psp_hw_init failed. > > > > > > > > Signed-off-by: Alice Wong <[email protected]> > > > > > > I don't think this is a good idea. The hw programming in hw_fini makes > > > no sense if the driver never set it up in the first place if hw_init > > > failed before initializing the hw. It would be better to just properly > > > unwind the relevant functions. Presumably the problem you are seeing is > > > the failure to free the GPU memory allocated in fw_fini, depending on > > > where it fails. We should just allocate the memory in sw_init; that is > > > what we do in other IPs. Does the attached patch fix the issue you are > > > seeing? > > > > > > Alex > > > > > > > --- > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 57 > > > > +++++++++++++------------ > > > > 1 file changed, 29 insertions(+), 28 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > > > index 5b9e48d44f5b..52b14efa848a 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > > > @@ -2537,6 +2537,34 @@ static int psp_load_fw(struct amdgpu_device > > > > *adev) > > > > return ret; > > > > } > > > > > > > > +static int psp_hw_fini(void *handle) { > > > > + struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > > > + struct psp_context *psp = &adev->psp; > > > > + > > > > + if (psp->ta_fw) { > > > > + psp_ras_terminate(psp); > > > > + psp_securedisplay_terminate(psp); > > > > + psp_rap_terminate(psp); > > > > + psp_dtm_terminate(psp); > > > > + psp_hdcp_terminate(psp); > > > > + } > > > > + > > > > + psp_asd_terminate(psp); > > > > + > > > > + psp_tmr_terminate(psp); > > > > + psp_ring_destroy(psp, PSP_RING_TYPE__KM); > > > > + > > > > + amdgpu_bo_free_kernel(&psp->fw_pri_bo, > > > > + &psp->fw_pri_mc_addr, &psp->fw_pri_buf); > > > > + amdgpu_bo_free_kernel(&psp->fence_buf_bo, > > > > + &psp->fence_buf_mc_addr, &psp->fence_buf); > > > > + amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr, > > > > + (void **)&psp->cmd_buf_mem); > > > > + > > > > + return 0; > > > > +} > > > > + > > > > static int psp_hw_init(void *handle) { > > > > int ret; > > > > @@ -2563,37 +2591,10 @@ static int psp_hw_init(void *handle) > > > > failed: > > > > adev->firmware.load_type = AMDGPU_FW_LOAD_DIRECT; > > > > mutex_unlock(&adev->firmware.mutex); > > > > + psp_hw_fini(handle); > > > > return -EINVAL; > > > > } > > > > > > > > -static int psp_hw_fini(void *handle) -{ > > > > - struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > > > - struct psp_context *psp = &adev->psp; > > > > - > > > > - if (psp->ta_fw) { > > > > - psp_ras_terminate(psp); > > > > - psp_securedisplay_terminate(psp); > > > > - psp_rap_terminate(psp); > > > > - psp_dtm_terminate(psp); > > > > - psp_hdcp_terminate(psp); > > > > - } > > > > - > > > > - psp_asd_terminate(psp); > > > > - > > > > - psp_tmr_terminate(psp); > > > > - psp_ring_destroy(psp, PSP_RING_TYPE__KM); > > > > - > > > > - amdgpu_bo_free_kernel(&psp->fw_pri_bo, > > > > - &psp->fw_pri_mc_addr, &psp->fw_pri_buf); > > > > - amdgpu_bo_free_kernel(&psp->fence_buf_bo, > > > > - &psp->fence_buf_mc_addr, &psp->fence_buf); > > > > - amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr, > > > > - (void **)&psp->cmd_buf_mem); > > > > - > > > > - return 0; > > > > -} > > > > - > > > > static int psp_suspend(void *handle) { > > > > int ret; > > > > -- > > > > 2.25.1 > > > >
