On 01.07.25 10:18, Zhang, GuoQing (Sam) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > > Hi Christian, > > > > Thank you for the feedback. > > > > For “return ret < 0 ? ret : 0;”, it is equivalent to “return ret;” since ret > is always <= 0 after the loop.
No it isn't. ttm_global_swapout() returns the number of pages swapped out and only a negative error code if something went wrong. And it's probably not a good idea to return that from the new function. Regards, Christian. > > > > For all other comments, I will revise the patch accordingly in v2. > > > > Regards > > Sam > > > > > > *From: *Koenig, Christian <christian.koe...@amd.com> > *Date: *Monday, June 30, 2025 at 19:54 > *To: *Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>, raf...@kernel.org > <raf...@kernel.org>, len.br...@intel.com <len.br...@intel.com>, > pa...@kernel.org <pa...@kernel.org>, Deucher, Alexander > <alexander.deuc...@amd.com>, Limonciello, Mario <mario.limoncie...@amd.com>, > Lazar, Lijo <lijo.la...@amd.com> > *Cc: *Zhao, Victor <victor.z...@amd.com>, Chang, HaiJun > <haijun.ch...@amd.com>, Ma, Qing (Mark) <qing...@amd.com>, > amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>, > dri-de...@lists.freedesktop.org <dri-de...@lists.freedesktop.org>, > linux...@vger.kernel.org <linux...@vger.kernel.org>, > linux-ker...@vger.kernel.org <linux-ker...@vger.kernel.org> > *Subject: *Re: [PATCH 1/3] drm/amdgpu: move GTT to SHM after eviction for > hibernation > > On 30.06.25 12:41, Samuel Zhang wrote: >> When hibernate with data center dGPUs, huge number of VRAM BOs evicted >> to GTT and takes too much system memory. This will cause hibernation >> fail due to insufficient memory for creating the hibernation image. >> >> Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel >> hibernation code to make room for hibernation image. > > This should probably be two patches, one for TTM and then an amdgpu patch to > forward the event. > >> >> Signed-off-by: Samuel Zhang <guoqing.zh...@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 ++++++++++++- >> drivers/gpu/drm/ttm/ttm_resource.c | 18 ++++++++++++++++++ >> include/drm/ttm/ttm_resource.h | 1 + >> 3 files changed, 31 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> index 4d57269c9ca8..5aede907a591 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >> @@ -2889,6 +2889,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, >> int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type) >> { >> struct ttm_resource_manager *man; >> + int r; >> >> switch (mem_type) { >> case TTM_PL_VRAM: >> @@ -2903,7 +2904,17 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device >> *adev, int mem_type) >> return -EINVAL; >> } >> >> - return ttm_resource_manager_evict_all(&adev->mman.bdev, man); >> + r = ttm_resource_manager_evict_all(&adev->mman.bdev, man); >> + if (r) { >> + DRM_ERROR("Failed to evict memory type %d\n", mem_type); >> + return r; >> + } >> + if (adev->in_s4 && mem_type == TTM_PL_VRAM) { >> + r = ttm_resource_manager_swapout(); >> + if (r) >> + DRM_ERROR("Failed to swap out, %d\n", r); >> + } >> + return r; >> } >> >> #if defined(CONFIG_DEBUG_FS) >> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c >> b/drivers/gpu/drm/ttm/ttm_resource.c >> index fd41b56e2c66..07b1f5a5afc2 100644 >> --- a/drivers/gpu/drm/ttm/ttm_resource.c >> +++ b/drivers/gpu/drm/ttm/ttm_resource.c >> @@ -534,6 +534,24 @@ void ttm_resource_manager_init(struct >> ttm_resource_manager *man, >> } >> EXPORT_SYMBOL(ttm_resource_manager_init); >> >> +int ttm_resource_manager_swapout(void) > > This needs documentation, better placement and a better name. > > First of all put it into ttm_device.c instead of the resource manager. > > Then call it something like ttm_device_prepare_hibernation or similar. > > >> +{ >> + struct ttm_operation_ctx ctx = { >> + .interruptible = false, >> + .no_wait_gpu = false, >> + .force_alloc = true >> + }; >> + int ret; >> + >> + while (true) { > > Make that: > > do { > ret = ... > } while (ret > 0); > >> + ret = ttm_global_swapout(&ctx, GFP_KERNEL); >> + if (ret <= 0) >> + break; >> + } >> + return ret; > > It's rather pointless to return the number of swapped out pages. > > Make that "return ret < 0 ? ret : 0; > > Regards, > Christian. > >> +} >> +EXPORT_SYMBOL(ttm_resource_manager_swapout); >> + >> /* >> * ttm_resource_manager_evict_all >> * >> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h >> index b873be9597e2..46181758068e 100644 >> --- a/include/drm/ttm/ttm_resource.h >> +++ b/include/drm/ttm/ttm_resource.h >> @@ -463,6 +463,7 @@ void ttm_resource_manager_init(struct >> ttm_resource_manager *man, >> >> int ttm_resource_manager_evict_all(struct ttm_device *bdev, >> struct ttm_resource_manager *man); >> +int ttm_resource_manager_swapout(void); >> >> uint64_t ttm_resource_manager_usage(struct ttm_resource_manager *man); >> void ttm_resource_manager_debug(struct ttm_resource_manager *man, >