Re: [RFC v1 0/2] Enable resume with different AMD SRIOV vGPUs

Gerry Liu Tue, 14 Jan 2025 21:24:19 -0800

> 2025年1月15日 12:03，Liu, Shaoyun <shaoyun....@amd.com> 写道：
> 
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
> I might misunderstood your requirement . For live migration, it's transparent 
>  to the guest.  The guest can be  in running  state (ex. like running  some 
> compute stuff),  hypervisor     and gim driver together will handle the GPU 
> HW state migration from source vGPU to other  identical  vGPU .  It doesn't 
> requires the guest to do the suspend/resume.  You can contact other engineers 
> that work on SRIOV for more live  migration support info.
Yeah, there are different usage scenarios:
1) live migration
2) hibernate/suspend/resume
3) snapshot and clone
Currently we are focusing on live migration and hibernation, and hope that we 
can base on common underlying technologies.

> 
> Regards
> Shaoyun.liu
> 
> -----Original Message-----
> From: Gerry Liu <ge...@linux.alibaba.com>
> Sent: Tuesday, January 14, 2025 8:48 PM
> To: Liu, Shaoyun <shaoyun....@amd.com>
> Cc: Koenig, Christian <christian.koe...@amd.com>; Deucher, Alexander 
> <alexander.deuc...@amd.com>; Pan, Xinhui <xinhui....@amd.com>; 
> airl...@gmail.com; sim...@ffwll.ch; Khatri, Sunil <sunil.kha...@amd.com>; 
> Lazar, Lijo <lijo.la...@amd.com>; Zhang, Hawking <hawking.zh...@amd.com>; 
> Limonciello, Mario <mario.limoncie...@amd.com>; Chen, Xiaogang 
> <xiaogang.c...@amd.com>; Russell, Kent <kent.russ...@amd.com>; 
> shuox....@linux.alibaba.com; amd-gfx@lists.freedesktop.org
> Subject: Re: [RFC v1 0/2] Enable resume with different AMD SRIOV vGPUs
> 
> 
> 
>> 2025年1月15日 02:00，Liu, Shaoyun <shaoyun....@amd.com> 写道：
>> 
>> [AMD Official Use Only - AMD Internal Distribution Only]
>> 
>> I think to resume with different SRIOV vGPUs depends on the  hypervisor has 
>> the live migration support . Different Hypervisor have different 
>> implementation , basically  it will call into the  host gpu driver in 
>> different stage and  host side do the  hw related  migration including the 
>> FW state.
> Hi Shaoyun,
>        Great news! That sounds like what I’m looking for:)
>        Is there any documentation about how to enable this with an in-house 
> implemented hypervisor? Will the hypervisor need to cooperate with the gim 
> driver to enable resume with different vGPUs?
> Regards
> Gerry
> 
>> 
>> Regards
>> Shaoyun.liu
>> 
>> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of
>> Christian König
>> Sent: Tuesday, January 14, 2025 7:44 AM
>> To: Gerry Liu <ge...@linux.alibaba.com>
>> Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Pan, Xinhui
>> <xinhui....@amd.com>; airl...@gmail.com; sim...@ffwll.ch; Khatri,
>> Sunil <sunil.kha...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; Zhang,
>> Hawking <hawking.zh...@amd.com>; Limonciello, Mario
>> <mario.limoncie...@amd.com>; Chen, Xiaogang <xiaogang.c...@amd.com>;
>> Russell, Kent <kent.russ...@amd.com>; shuox....@linux.alibaba.com;
>> amd-gfx@lists.freedesktop.org
>> Subject: Re: [RFC v1 0/2] Enable resume with different AMD SRIOV vGPUs
>> 
>> Hi Gerry,
>> 
>> Am 14.01.25 um 12:03 schrieb Gerry Liu:
>> 2025年1月14日 18:46，Christian König <christian.koe...@amd.com> 写道：
>> 
>> Hi Jiang,
>> 
>> Some of the firmware, especially the multimedia ones, keep FW pointers to 
>> buffers in the suspend/resume state.
>> 
>> In other words the firmware needs to be in the exact same location before 
>> and after resume. That's why we don't unpin the firmware BOs, but rather 
>> save their content and restore it. See function amdgpu_vcn_save_vcpu_bo() 
>> for reference.
>> 
>> Additional to that the serial numbers, IDs etc are used for things like TMZ. 
>> So anything which uses HW encryption won't work any more.
>> 
>> Then even two identical boards can have different harvest and memory channel 
>> configurations. Could be that we might be able to abstract that with SR-IOV 
>> but I won't rely on that.
>> 
>> To summarize that looks like a completely futile effort which most likely 
>> won't work reliable in a production environment.
>> Hi Christian,
>>  Thanks for the information. Previously I assume that we may reset the asic 
>> and reload all firmwares on resume, but missed the vcn ip block which save 
>> and restore firmware vram content during suspend/resume. Is there any other 
>> IP blocks which save and restore firmware ram content?
>> 
>> Not that I of hand know of any, but neither the hypervisor nor the driver 
>> stack was designed with something like this in mind. So could be that there 
>> are other dependencies I don't know about.
>> 
>> I do remember that this idea of resuming on different HW than suspending 
>> came up a while ago and was rejected by multiple parties as to complicated 
>> and error prone.
>> 
>> So we never looked more deeply into the possibility of doing that.
>> 
>> 
>> 
>>  Our usage scenario targets GPGPU workload (amdkfd) with AMD GPU in single 
>> SR-IOV vGPU mode. Is it possible to resume on a different vGPU device in 
>> such a case?
>> 
>> If I'm not completely mistaken you can use checkpoint/restore for that. It's 
>> still under development, but as far as I can see it should solve your 
>> problem quite nicely.
>> 
>> Regards,
>> Christian.
>> 
>> 
>> 
>> Regards,
>> Gerry
>> 
>> 
>> 
>> Regards,
>> Christian.
>> 
>> Am 14.01.25 um 10:54 schrieb Jiang Liu:
>> For virtual machines with AMD SR-IOV vGPUs, following work flow may be
>> used to support virtual machine hibernation(suspend):
>> 1) suspends a virtual machine with AMD vGPU A.
>> 2) hypervisor dumps guest RAM content to a disk image.
>> 3) hypervisor loads the guest system image from disk.
>> 4) resumes the guest OS with a different AMD vGPU B.
>> 
>> The step 4 above is special because we are resuming with a different
>> AMD vGPU device and the amdgpu driver may observe changed device
>> properties. To support above work flow, we need to fix those changed
>> device properties cached by the amdgpu drivers.
>> 
>> With information from the amdgpu driver source code (haven't read
>> corresponding hardware specs yet), we have identified following
>> changed device properties:
>> 1) PCI MMIO address. This can be fixed by hypervisor.
>> 2) serial_number, unique_id, xgmi_device_id, fru_id in sysfs. Seems
>>   they are information only.
>> 3) xgmi_physical_id if xgmi is enabled, which affects VRAM MC address.
>> 4) mc_fb_offset, which affects VRAM physical address.
>> 
>> We will focus on the VRAM address related changes here, because it's
>> sensitive to the GPU functionalities. The original data sources
>> include .get_mc_fb_offset(), .get_fb_location() and xgmi hardware registers.
>> The main data cached by amdgpu driver are adev->gmc.vram_start and
>> adev->vm_manager.vram_base_offset. And the major consumers of the
>> cached information are ip_block.hw_init() and GMU page table builder.
>> 
>> After code analysis, we found that most consumers of
>> dev->gmc.vram_start and adev->vm_manager.vram_base_offset directly
>> read value from these two variables on demand instead of caching them.
>> So if we fix these two cached fields on resume, everything should work as 
>> expected.
>> 
>> But there's an exception, and an very import exception, that callers
>> of amdgpu_bo_create_kernel()/amdgpu_bo_create_reserved() may cache
>> VRAM addresses. With further analysis, the callers of these interface
>> have three different patterns:
>> 1) This pattern is safe.
>>   - call amdgpu_bo_create_reserved() in ip_block.hw_init()
>>   - call amdgpu_bo_free_kernel() in ip_block.suspend()
>>   - call amdgpu_bo_create_reserved() in ip_block.resume()
>> 2) This pattern works with current implementaiton of 
>> amdgpu_bo_create_reserved()
>>   but bo.pin_count gets incorrect.
>>   - call amdgpu_bo_create_reserved() in ip_block.hw_init()
>>   - call amdgpu_bo_create_reserved() in ip_block.resume()
>> 3) This pattern needs to be enhanced.
>>   - call amdgpu_bo_create_reserved() in ip_block.sw_init()
>> 
>> So my question is which pattern should we use here? Personally I
>> prefer pattern 2 with enhancement to fix the bo.pin_count.
>> 
>> Currently there're still bugs in SRIOV suspend/resume, so we can't
>> test our hypothesis. And we are not sure whether there are still other
>> blocking to enable resume with different AMD SR-IOV vGPUs.
>> 
>> Help is needed to identify more task items to enable resume with
>> different AMD SR-IOV vGPUs:)
>> 
>> Jiang Liu (2):
>>  drm/amdgpu: update cached vram base addresses on resume
>>  drm/amdgpu: introduce helper amdgpu_bo_get_pinned_gpu_addr()
>> 
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 15 +++++++++++++++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h      |  6 ++++--
>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   |  9 +++++++++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h   |  1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c |  9 +++++++++
>> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c       |  7 +++++++
>> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c        |  6 ++++++
>> 7 files changed, 51 insertions(+), 2 deletions(-)
>
Re: [RFC v1 0/2] Enable resume with different AMD SRIOV vGPUs

Reply via email to