[AMD Official Use Only - AMD Internal Distribution Only] Ping @Lazar, Lijo<mailto:lijo.la...@amd.com>, @Koenig, Christian<mailto:christian.koe...@amd.com>…
Kindly pls review the updated patch in advance and we can discuss your suggestions in tomorrow's meeting. Thanks for your great support. Rgds/Owen From: Deng, Emily <emily.d...@amd.com> Sent: Monday, May 26, 2025 9:56 AM To: Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; Deucher, Alexander <alexander.deuc...@amd.com> Cc: Zhao, Victor <victor.z...@amd.com>; Chang, HaiJun <haijun.ch...@amd.com>; Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>; Zhang, Owen(SRDC) <owen.zha...@amd.com>; Ma, Qing (Mark) <qing...@amd.com>; amd-gfx@lists.freedesktop.org Subject: RE: [PATCH v8 0/4] enable xgmi node migration support for hibernate on SRIOV [AMD Official Use Only - AMD Internal Distribution Only] @Koenig, Christian<mailto:christian.koe...@amd.com> and @Lazar, Lijo<mailto:lijo.la...@amd.com> Could you help review these changes again? Best whishes Emily Deng >-----Original Message----- >From: Samuel Zhang <guoqing.zh...@amd.com<mailto:guoqing.zh...@amd.com>> >Sent: Thursday, May 22, 2025 6:41 PM >To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> >Cc: Zhao, Victor <victor.z...@amd.com<mailto:victor.z...@amd.com>>; Chang, >HaiJun ><haijun.ch...@amd.com<mailto:haijun.ch...@amd.com>>; Zhang, GuoQing (Sam) ><guoqing.zh...@amd.com<mailto:guoqing.zh...@amd.com>>; >Koenig, Christian <christian.koe...@amd.com<mailto:christian.koe...@amd.com>>; >Deucher, Alexander ><alexander.deuc...@amd.com<mailto:alexander.deuc...@amd.com>>; Zhang, >Owen(SRDC) <owen.zha...@amd.com<mailto:owen.zha...@amd.com>>; >Ma, Qing (Mark) <qing...@amd.com<mailto:qing...@amd.com>>; Lazar, Lijo ><lijo.la...@amd.com<mailto:lijo.la...@amd.com>>; Deng, >Emily <emily.d...@amd.com<mailto:emily.d...@amd.com>> >Subject: [PATCH v8 0/4] enable xgmi node migration support for hibernate on >SRIOV > >On SRIOV and VM environment, customer may need to switch to new vGPU indexes >after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will >change in this case, the FB aperture gpu address of VRAM BOs will also change. >These gpu addresses need to be updated when resume. But these addresses are all >over the KMD codebase, updating each of them is error-prone and not acceptable. > >The solution is to use pdb0 page table to cover both vram and gart memory and >use >pdb0 virtual gpu address instead. When gpu indexes change, the virtual gpu >address >won't change. > >For psp and smu, pdb0's gpu address does not work, so the original FB aperture >gpu >address is used instead. They need to be updated when resume with changed >vGPUs. > >v2: >- remove physical_node_id_changed >- set vram_start to 0 to switch cached gpu addr to gart aperture >- cleanup pdb0 patch >v3: >- remove gmc_v9_0_init_sw_mem_ranges() call >- remove vram_offset memeber >- add 4 refactoring patch to remove cached gpu addr >- cleanup pdb0 patch >v4: >- remove gmc_v9_0_mc_init() call and `refresh` update. >- do not set `fb_start` in mmhub_v1_8_get_fb_location() when pdb0 enabled. >v5: >- add amdgpu_virt_xgmi_migrate_enabled() check >- move vram_base_offset update to pdb0 patch >- remove 4 refactoring patches to remove cached gpu addr >- add patch to fix IH not working issue when resume with new VF >v6: per Lijo feedback >- rename amdgpu_device_update_xgmi_info() to amdgpu_virt_resume() >- merge xgmi node and vram_base_offset update, IH fix into amdgpu_virt_resume() >- remove 2 unnecessary gpu addr update changes >v7: per Christian feedback >- remove pdb0_enabled and add gmc_v9_0_is_pdb0_enabled() >- remove amdgpu_gmc_vram_location() call in amdgpu_gmc_sysvm_location() >- remove check in mmhub_v1_8_get_fb_location() and update fb_start/fb_end on >resume >v8: >- use cached fb_start in amdgpu_bo_fb_aper_addr() >- remove fb_start/fb_end update in amdgpu_virt_resume() and >amdgpu_gmc_sysvm_location() >- use vram_start to set regVM_CONTEXT0_PAGE_TABLE_START_ADDR_* >- move check to the callsite of amdgpu_virt_resume() >- add gmc.xgmi.node_segment_size check in amdgpu_virt_xgmi_migrate_enabled() >- rename gmc_v9_0_is_pdb0_enabled() to amdgpu_gmc_is_pdb0_enabled() > >Samuel Zhang (4): > drm/amdgpu: update xgmi info and vram_base_offset on resume > drm/amdgpu: update GPU addresses for SMU and PSP > drm/amdgpu: enable pdb0 for hibernation on SRIOV > drm/amdgpu: fix fence fallback timer expired error > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 ++++++++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 28 ++++++++++++---- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 20 ++++++++++++ >drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 23 +++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 3 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 7 ++++ > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 8 +++-- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 13 +++++--- > drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 6 ++-- > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 18 ++++++++++ > 13 files changed, 151 insertions(+), 17 deletions(-) > >-- >2.43.5