Re: [PATCH v5 0/4] enable xgmi node migration support for hibernate on SRIOV.

Zhang, GuoQing (Sam) Fri, 16 May 2025 00:03:50 -0700

[AMD Official Use Only - AMD Internal Distribution Only]

Hi @Koenig, Christian<mailto:christian.koe...@amd.com> and @Lazar, 
Lijo<mailto:lijo.la...@amd.com>,

Ping…

Thanks
Sam

From: Zhang, Owen(SRDC) <owen.zha...@amd.com>
Date: Wednesday, May 14, 2025 at 18:07
To: Koenig, Christian <christian.koe...@amd.com>, Zhang, GuoQing (Sam) 
<guoqing.zh...@amd.com>, amd-gfx@lists.freedesktop.org 
<amd-gfx@lists.freedesktop.org>, Lazar, Lijo <lijo.la...@amd.com>
Cc: Zhao, Victor <victor.z...@amd.com>, Chang, HaiJun <haijun.ch...@amd.com>, 
Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>, Deucher, Alexander 
<alexander.deuc...@amd.com>, Ma, Qing (Mark) <qing...@amd.com>
Subject: RE: [PATCH v5 0/4] enable xgmi node migration support for hibernate on 
SRIOV.
[AMD Official Use Only - AMD Internal Distribution Only]

Hi, @Koenig, Christian @Lazar, Lijo kindly pls provide your expertise for the 
Sam's update below. Thanks for your support.

Rgds/Owen

-----Original Message-----
From: Samuel Zhang <guoqing.zh...@amd.com>
Sent: Monday, May 12, 2025 2:42 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Victor <victor.z...@amd.com>; Chang, HaiJun <haijun.ch...@amd.com>; 
Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>; Koenig, Christian 
<christian.koe...@amd.com>; Deucher, Alexander <alexander.deuc...@amd.com>; 
Zhang, Owen(SRDC) <owen.zha...@amd.com>; Ma, Qing (Mark) <qing...@amd.com>
Subject: [PATCH v5 0/4] enable xgmi node migration support for hibernate on 
SRIOV.

On SRIOV and VM environment, customer may need to switch to new vGPU indexes 
after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will 
change in this case, the FB aperture gpu address of VRAM BOs will also change.
These gpu addresses need to be updated when resume. But these addresses are all 
over the KMD codebase, updating each of them is error-prone and not acceptable.

The solution is to use pdb0 page table to cover both vram and gart memory and 
use pdb0 virtual gpu address instead. When gpu indexes change, the virtual gpu 
address won't change.

For psp and smu, pdb0's gpu address does not work, so the original FB aperture 
gpu address is used instead. They need to be updated when resume with changed 
vGPUs.

v2:
- remove physical_node_id_changed
- set vram_start to 0 to switch cached gpu addr to gart aperture
- cleanup pdb0 patch
v3:
- remove gmc_v9_0_init_sw_mem_ranges() call
- remove vram_offset memeber
- add 4 refactoring patch to remove cached gpu addr
- cleanup pdb0 patch
v4:
- remove gmc_v9_0_mc_init() call and `refresh` update.
- do not set `fb_start` in mmhub_v1_8_get_fb_location() when pdb0 enabled.
v5:
- add amdgpu_virt_xgmi_migrate_enabled() check
- move vram_base_offset update to pdb0 patch
- remove 4 refactoring patches to remove cached gpu addr
- add patch to fix IH not working issue when resume with new VF

Samuel Zhang (4):
  drm/amdgpu: update xgmi info on resume
  drm/amdgpu: update GPU addresses for SMU and PSP
  drm/amdgpu: enable pdb0 for hibernation on SRIOV
  drm/amdgpu: fix fence fallback timer expired error

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c    | 32 +++++++++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 20 ++++++++++++++  
drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c    | 27 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  7 +++++
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c      | 16 ++++++++---
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c    |  6 ++--
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c     |  4 +++
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  | 18 ++++++++++++
 15 files changed, 152 insertions(+), 15 deletions(-)

--
2.43.5

Re: [PATCH v5 0/4] enable xgmi node migration support for hibernate on SRIOV.

Reply via email to