Looks good.  Thanks for fixing that.  I've sent the patches out for review!

Alex

On Wed, Apr 27, 2022 at 9:37 PM Wong, Alice <[email protected]> wrote:
>
> [AMD Official Use Only - General]
>
> Need this patch on top of your patches to handle psp_load_fw failure.
>
> Alice
>
> -----Original Message-----
> From: Alex Deucher <[email protected]>
> Sent: April 22, 2022 5:23 PM
> To: Wong, Alice <[email protected]>
> Cc: amd-gfx list <[email protected]>
> Subject: Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed v2
>
> How about these patches?
>
> Alex
>
> On Fri, Apr 22, 2022 at 5:00 PM Alex Deucher <[email protected]> wrote:
> >
> > On Fri, Apr 22, 2022 at 3:54 PM Wong, Alice <[email protected]> wrote:
> > >
> > > [AMD Official Use Only]
> > >
> > > Hi Alex,
> > >
> > > The attached patch freed most of the allocated memory except for one 
> > > allocated by psp_tmr_init during psp_load_fw.
> > > Combination of the attached patch and calling psp_hw_fini when 
> > > psp_hw_init failed would fix the issue.
> > >
> > > As a proper fix, we can call psp_tmr_terminate in psp_load_fw when
> > > psp_load_non_psp_fw failed. (attached patch) We can't move psp_tmr_init 
> > > to sw_init because we need to load toc to get the TMR size.
> > > Do you have any concerns with this approach?
> >
> > That only covers failures in hw_init().  It doesn't handle resume().
> > Looks like all of the ta functions also potentially leak.  I'm working
> > on a cleanup to handle all of these.
> >
> > Alex
> >
> > >
> > > Alice
> > >
> > >
> > > -----Original Message-----
> > > From: Alex Deucher <[email protected]>
> > > Sent: April 21, 2022 1:31 AM
> > > To: Wong, Alice <[email protected]>
> > > Cc: amd-gfx list <[email protected]>
> > > Subject: Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed
> > > v2
> > >
> > > On Wed, Apr 20, 2022 at 5:48 PM Alice Wong <[email protected]> wrote:
> > > >
> > > > If at any point psp_hw_init failed, psp_hw_fini would not be
> > > > called during unload due to ip_blocks[PSP].status.hw not being set to 
> > > > true.
> > > > This could cause a memory leak when the driver unloads.
> > > > As a rule of thumb, each IP block should cleanup themselves when
> > > > their hw_init fails. Only previously intialized blocks should be
> > > > cleaned up by the common framework.
> > > >
> > > > v1: Call IP blocks' respective hw_fini when hw_init failed from
> > > >     the common framework
> > > > v2: Call psp_hw_fini when psp_hw_init failed.
> > > >
> > > > Signed-off-by: Alice Wong <[email protected]>
> > >
> > > I don't think this is a good idea.  The hw programming in hw_fini makes 
> > > no sense if the driver never set it up in the first place if hw_init 
> > > failed before initializing the hw.  It would be better to just properly 
> > > unwind the relevant functions.  Presumably the problem you are seeing is 
> > > the failure to free the GPU memory allocated in fw_fini, depending on 
> > > where it fails.  We should just allocate the memory in sw_init; that is 
> > > what we do in other IPs.  Does the attached patch fix the issue you are 
> > > seeing?
> > >
> > > Alex
> > >
> > > > ---
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 57
> > > > +++++++++++++------------
> > > >  1 file changed, 29 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > > > index 5b9e48d44f5b..52b14efa848a 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > > > @@ -2537,6 +2537,34 @@ static int psp_load_fw(struct amdgpu_device 
> > > > *adev)
> > > >         return ret;
> > > >  }
> > > >
> > > > +static int psp_hw_fini(void *handle) {
> > > > +       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > > > +       struct psp_context *psp = &adev->psp;
> > > > +
> > > > +       if (psp->ta_fw) {
> > > > +               psp_ras_terminate(psp);
> > > > +               psp_securedisplay_terminate(psp);
> > > > +               psp_rap_terminate(psp);
> > > > +               psp_dtm_terminate(psp);
> > > > +               psp_hdcp_terminate(psp);
> > > > +       }
> > > > +
> > > > +       psp_asd_terminate(psp);
> > > > +
> > > > +       psp_tmr_terminate(psp);
> > > > +       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > > > +
> > > > +       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > > > +                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > > > +       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > > > +                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > > > +       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > > > +                             (void **)&psp->cmd_buf_mem);
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > >  static int psp_hw_init(void *handle)  {
> > > >         int ret;
> > > > @@ -2563,37 +2591,10 @@ static int psp_hw_init(void *handle)
> > > >  failed:
> > > >         adev->firmware.load_type = AMDGPU_FW_LOAD_DIRECT;
> > > >         mutex_unlock(&adev->firmware.mutex);
> > > > +       psp_hw_fini(handle);
> > > >         return -EINVAL;
> > > >  }
> > > >
> > > > -static int psp_hw_fini(void *handle) -{
> > > > -       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > > > -       struct psp_context *psp = &adev->psp;
> > > > -
> > > > -       if (psp->ta_fw) {
> > > > -               psp_ras_terminate(psp);
> > > > -               psp_securedisplay_terminate(psp);
> > > > -               psp_rap_terminate(psp);
> > > > -               psp_dtm_terminate(psp);
> > > > -               psp_hdcp_terminate(psp);
> > > > -       }
> > > > -
> > > > -       psp_asd_terminate(psp);
> > > > -
> > > > -       psp_tmr_terminate(psp);
> > > > -       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > > > -
> > > > -       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > > > -                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > > > -       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > > > -                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > > > -       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > > > -                             (void **)&psp->cmd_buf_mem);
> > > > -
> > > > -       return 0;
> > > > -}
> > > > -
> > > >  static int psp_suspend(void *handle)  {
> > > >         int ret;
> > > > --
> > > > 2.25.1
> > > >

Reply via email to