> 2025年1月15日 02:00,Liu, Shaoyun <shaoyun....@amd.com> 写道:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> I think to resume with different SRIOV vGPUs depends on the hypervisor has
> the live migration support . Different Hypervisor have different
> implementation , basically it will call into the host gpu driver in
> different stage and host side do the hw related migration including the FW
> state.
Hi Shaoyun,
Great news! That sounds like what I’m looking for:)
Is there any documentation about how to enable this with an in-house
implemented hypervisor? Will the hypervisor need to cooperate with the gim
driver to enable resume with different vGPUs?
Regards
Gerry
>
> Regards
> Shaoyun.liu
>
> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Christian
> König
> Sent: Tuesday, January 14, 2025 7:44 AM
> To: Gerry Liu <ge...@linux.alibaba.com>
> Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Pan, Xinhui
> <xinhui....@amd.com>; airl...@gmail.com; sim...@ffwll.ch; Khatri, Sunil
> <sunil.kha...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; Zhang, Hawking
> <hawking.zh...@amd.com>; Limonciello, Mario <mario.limoncie...@amd.com>;
> Chen, Xiaogang <xiaogang.c...@amd.com>; Russell, Kent <kent.russ...@amd.com>;
> shuox....@linux.alibaba.com; amd-gfx@lists.freedesktop.org
> Subject: Re: [RFC v1 0/2] Enable resume with different AMD SRIOV vGPUs
>
> Hi Gerry,
>
> Am 14.01.25 um 12:03 schrieb Gerry Liu:
> 2025年1月14日 18:46,Christian König <christian.koe...@amd.com> 写道:
>
> Hi Jiang,
>
> Some of the firmware, especially the multimedia ones, keep FW pointers to
> buffers in the suspend/resume state.
>
> In other words the firmware needs to be in the exact same location before and
> after resume. That's why we don't unpin the firmware BOs, but rather save
> their content and restore it. See function amdgpu_vcn_save_vcpu_bo() for
> reference.
>
> Additional to that the serial numbers, IDs etc are used for things like TMZ.
> So anything which uses HW encryption won't work any more.
>
> Then even two identical boards can have different harvest and memory channel
> configurations. Could be that we might be able to abstract that with SR-IOV
> but I won't rely on that.
>
> To summarize that looks like a completely futile effort which most likely
> won't work reliable in a production environment.
> Hi Christian,
> Thanks for the information. Previously I assume that we may reset the asic
> and reload all firmwares on resume, but missed the vcn ip block which save
> and restore firmware vram content during suspend/resume. Is there any other
> IP blocks which save and restore firmware ram content?
>
> Not that I of hand know of any, but neither the hypervisor nor the driver
> stack was designed with something like this in mind. So could be that there
> are other dependencies I don't know about.
>
> I do remember that this idea of resuming on different HW than suspending came
> up a while ago and was rejected by multiple parties as to complicated and
> error prone.
>
> So we never looked more deeply into the possibility of doing that.
>
>
>
> Our usage scenario targets GPGPU workload (amdkfd) with AMD GPU in single
> SR-IOV vGPU mode. Is it possible to resume on a different vGPU device in such
> a case?
>
> If I'm not completely mistaken you can use checkpoint/restore for that. It's
> still under development, but as far as I can see it should solve your problem
> quite nicely.
>
> Regards,
> Christian.
>
>
>
> Regards,
> Gerry
>
>
>
> Regards,
> Christian.
>
> Am 14.01.25 um 10:54 schrieb Jiang Liu:
> For virtual machines with AMD SR-IOV vGPUs, following work flow may be
> used to support virtual machine hibernation(suspend):
> 1) suspends a virtual machine with AMD vGPU A.
> 2) hypervisor dumps guest RAM content to a disk image.
> 3) hypervisor loads the guest system image from disk.
> 4) resumes the guest OS with a different AMD vGPU B.
>
> The step 4 above is special because we are resuming with a different
> AMD vGPU device and the amdgpu driver may observe changed device
> properties. To support above work flow, we need to fix those changed
> device properties cached by the amdgpu drivers.
>
> With information from the amdgpu driver source code (haven't read
> corresponding hardware specs yet), we have identified following changed
> device properties:
> 1) PCI MMIO address. This can be fixed by hypervisor.
> 2) serial_number, unique_id, xgmi_device_id, fru_id in sysfs. Seems
> they are information only.
> 3) xgmi_physical_id if xgmi is enabled, which affects VRAM MC address.
> 4) mc_fb_offset, which affects VRAM physical address.
>
> We will focus on the VRAM address related changes here, because it's
> sensitive to the GPU functionalities. The original data sources include
> .get_mc_fb_offset(), .get_fb_location() and xgmi hardware registers.
> The main data cached by amdgpu driver are adev->gmc.vram_start and
> adev->vm_manager.vram_base_offset. And the major consumers of the
> cached information are ip_block.hw_init() and GMU page table builder.
>
> After code analysis, we found that most consumers of dev->gmc.vram_start
> and adev->vm_manager.vram_base_offset directly read value from these
> two variables on demand instead of caching them. So if we fix these
> two cached fields on resume, everything should work as expected.
>
> But there's an exception, and an very import exception, that callers
> of amdgpu_bo_create_kernel()/amdgpu_bo_create_reserved() may cache
> VRAM addresses. With further analysis, the callers of these interface
> have three different patterns:
> 1) This pattern is safe.
> - call amdgpu_bo_create_reserved() in ip_block.hw_init()
> - call amdgpu_bo_free_kernel() in ip_block.suspend()
> - call amdgpu_bo_create_reserved() in ip_block.resume()
> 2) This pattern works with current implementaiton of
> amdgpu_bo_create_reserved()
> but bo.pin_count gets incorrect.
> - call amdgpu_bo_create_reserved() in ip_block.hw_init()
> - call amdgpu_bo_create_reserved() in ip_block.resume()
> 3) This pattern needs to be enhanced.
> - call amdgpu_bo_create_reserved() in ip_block.sw_init()
>
> So my question is which pattern should we use here? Personally I prefer
> pattern 2 with enhancement to fix the bo.pin_count.
>
> Currently there're still bugs in SRIOV suspend/resume, so we can't test
> our hypothesis. And we are not sure whether there are still other
> blocking to enable resume with different AMD SR-IOV vGPUs.
>
> Help is needed to identify more task items to enable resume with
> different AMD SR-IOV vGPUs:)
>
> Jiang Liu (2):
> drm/amdgpu: update cached vram base addresses on resume
> drm/amdgpu: introduce helper amdgpu_bo_get_pinned_gpu_addr()
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 ++++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 +++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 9 +++++++++
> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 +++++++
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 ++++++
> 7 files changed, 51 insertions(+), 2 deletions(-)