On 30.06.25 12:41, Samuel Zhang wrote: > The hibernation successful workflow: > - prepare: evict VRAM and swapout GTT BOs > - freeze > - create the hibernation image in system memory > - thaw: swapin and restore BOs
Why should a thaw happen here in between? > - complete > - write hibernation image to disk > - amdgpu_pci_shutdown > - goto S5, turn off the system. > > During prepare stage of hibernation, VRAM and GTT BOs will be swapout to > shmem. Then in thaw stage, all BOs will be swapin and restored. That's not correct. This is done by the application starting again and not during thaw. > > On server with 192GB VRAM * 8 dGPUs and 1.7TB system memory, > the swapin and restore BOs takes too long (50 minutes) and it is not > necessary since the follow-up stages does not use GPU. > > This patch is to skip BOs restore during thaw to reduce the hibernation > time. As far as I can see that doesn't make sense. The KFD processes need to be resumed here and that can't be skipped. Regards, Christian. > > Signed-off-by: Samuel Zhang <guoqing.zh...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index a8f4697deb1b..b550d07190a2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -5328,7 +5328,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool > notify_clients) > amdgpu_virt_init_data_exchange(adev); > amdgpu_virt_release_full_gpu(adev, true); > > - if (!adev->in_s0ix && !r && !adev->in_runpm) > + if (!adev->in_s0ix && !r && !adev->in_runpm && !adev->in_s4) > r = amdgpu_amdkfd_resume_process(adev); > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 571b70da4562..23b76e8ac2fd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2734,7 +2734,9 @@ static int amdgpu_pmops_poweroff(struct device *dev) > static int amdgpu_pmops_restore(struct device *dev) > { > struct drm_device *drm_dev = dev_get_drvdata(dev); > + struct amdgpu_device *adev = drm_to_adev(drm_dev); > > + adev->in_s4 = false; > return amdgpu_device_resume(drm_dev, true); > } >