[AMD Official Use Only - AMD Internal Distribution Only]

Ping @Lazar, Lijo<mailto:lijo.la...@amd.com>, @Koenig, 
Christian<mailto:christian.koe...@amd.com>…

Kindly pls review the updated patch in advance and we can discuss your 
suggestions in tomorrow's meeting. Thanks for your great support.


Rgds/Owen

From: Deng, Emily <emily.d...@amd.com>
Sent: Monday, May 26, 2025 9:56 AM
To: Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>; Koenig, Christian 
<christian.koe...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; Deucher, 
Alexander <alexander.deuc...@amd.com>
Cc: Zhao, Victor <victor.z...@amd.com>; Chang, HaiJun <haijun.ch...@amd.com>; 
Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>; Zhang, Owen(SRDC) 
<owen.zha...@amd.com>; Ma, Qing (Mark) <qing...@amd.com>; 
amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH v8 0/4] enable xgmi node migration support for hibernate on 
SRIOV


[AMD Official Use Only - AMD Internal Distribution Only]


@Koenig, Christian<mailto:christian.koe...@amd.com> and @Lazar, 
Lijo<mailto:lijo.la...@amd.com>

Could you help review these changes again?



Best whishes

Emily Deng



>-----Original Message-----

>From: Samuel Zhang <guoqing.zh...@amd.com<mailto:guoqing.zh...@amd.com>>

>Sent: Thursday, May 22, 2025 6:41 PM

>To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

>Cc: Zhao, Victor <victor.z...@amd.com<mailto:victor.z...@amd.com>>; Chang, 
>HaiJun

><haijun.ch...@amd.com<mailto:haijun.ch...@amd.com>>; Zhang, GuoQing (Sam) 
><guoqing.zh...@amd.com<mailto:guoqing.zh...@amd.com>>;

>Koenig, Christian <christian.koe...@amd.com<mailto:christian.koe...@amd.com>>; 
>Deucher, Alexander

><alexander.deuc...@amd.com<mailto:alexander.deuc...@amd.com>>; Zhang, 
>Owen(SRDC) <owen.zha...@amd.com<mailto:owen.zha...@amd.com>>;

>Ma, Qing (Mark) <qing...@amd.com<mailto:qing...@amd.com>>; Lazar, Lijo 
><lijo.la...@amd.com<mailto:lijo.la...@amd.com>>; Deng,

>Emily <emily.d...@amd.com<mailto:emily.d...@amd.com>>

>Subject: [PATCH v8 0/4] enable xgmi node migration support for hibernate on 
>SRIOV

>

>On SRIOV and VM environment, customer may need to switch to new vGPU indexes

>after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will

>change in this case, the FB aperture gpu address of VRAM BOs will also change.

>These gpu addresses need to be updated when resume. But these addresses are all

>over the KMD codebase, updating each of them is error-prone and not acceptable.

>

>The solution is to use pdb0 page table to cover both vram and gart memory and 
>use

>pdb0 virtual gpu address instead. When gpu indexes change, the virtual gpu 
>address

>won't change.

>

>For psp and smu, pdb0's gpu address does not work, so the original FB aperture 
>gpu

>address is used instead. They need to be updated when resume with changed

>vGPUs.

>

>v2:

>- remove physical_node_id_changed

>- set vram_start to 0 to switch cached gpu addr to gart aperture

>- cleanup pdb0 patch

>v3:

>- remove gmc_v9_0_init_sw_mem_ranges() call

>- remove vram_offset memeber

>- add 4 refactoring patch to remove cached gpu addr

>- cleanup pdb0 patch

>v4:

>- remove gmc_v9_0_mc_init() call and `refresh` update.

>- do not set `fb_start` in mmhub_v1_8_get_fb_location() when pdb0 enabled.

>v5:

>- add amdgpu_virt_xgmi_migrate_enabled() check

>- move vram_base_offset update to pdb0 patch

>- remove 4 refactoring patches to remove cached gpu addr

>- add patch to fix IH not working issue when resume with new VF

>v6: per Lijo feedback

>- rename amdgpu_device_update_xgmi_info() to amdgpu_virt_resume()

>- merge xgmi node and vram_base_offset update, IH fix into amdgpu_virt_resume()

>- remove 2 unnecessary gpu addr update changes

>v7: per Christian feedback

>- remove pdb0_enabled and add gmc_v9_0_is_pdb0_enabled()

>- remove amdgpu_gmc_vram_location() call in amdgpu_gmc_sysvm_location()

>- remove check in mmhub_v1_8_get_fb_location() and update fb_start/fb_end on

>resume

>v8:

>- use cached fb_start in amdgpu_bo_fb_aper_addr()

>- remove fb_start/fb_end update in amdgpu_virt_resume() and

>amdgpu_gmc_sysvm_location()

>- use vram_start to set regVM_CONTEXT0_PAGE_TABLE_START_ADDR_*

>- move check to the callsite of amdgpu_virt_resume()

>- add gmc.xgmi.node_segment_size check in amdgpu_virt_xgmi_migrate_enabled()

>- rename gmc_v9_0_is_pdb0_enabled() to amdgpu_gmc_is_pdb0_enabled()

>

>Samuel Zhang (4):

>  drm/amdgpu: update xgmi info and vram_base_offset on resume

>  drm/amdgpu: update GPU addresses for SMU and PSP

>  drm/amdgpu: enable pdb0 for hibernation on SRIOV

>  drm/amdgpu: fix fence fallback timer expired error

>

> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 ++++++++++++++++++++++

> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c    | 28 ++++++++++++----

> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    |  2 +-

> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +

> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 20 ++++++++++++

>drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +

> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c    | 23 +++++++++++++

> drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  3 ++

> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  7 ++++

> drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c   |  8 +++--

> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c      | 13 +++++---

> drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c    |  6 ++--

> drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  | 18 ++++++++++

> 13 files changed, 151 insertions(+), 17 deletions(-)

>

>--

>2.43.5


Reply via email to