Re: [PATCH v5 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation

2025-07-09 Thread Zhang, GuoQing (Sam)
On 2025/7/10 12:18, Lazar, Lijo wrote: On 7/10/2025 1:20 AM, Mario Limonciello wrote: On 7/9/2025 6:05 AM, Samuel Zhang wrote: For normal hibernation, GPU do not need to be resumed in thaw since it is not involved in writing the hibernation image. Skip resume in this case can reduce the hibe

Re: [PATCH v3 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation

2025-07-08 Thread Zhang, GuoQing (Sam)
On 2025/7/8 22:40, Mario Limonciello wrote: On 7/8/2025 3:42 AM, Samuel Zhang wrote: For normal hibernation, GPU do not need to be resumed in thaw since it is not involved in writing the hibernation image. Skip resume in this case can reduce the hibernation time. On VM with 8 * 192GB VRAM dG

Re: [PATCH v3 4/5] PM: hibernate: add new api pm_transition_event()

2025-07-08 Thread Zhang, GuoQing (Sam)
On 2025/7/8 22:36, Mario Limonciello wrote: On 7/8/2025 3:42 AM, Samuel Zhang wrote: dev_pm_ops.thaw() is called in following cases: * normal case: after hibernation image has been created. * error case 1: creation of a hibernation image has failed. * error case 2: restoration from a hibernati

Re: [PATCH 3/3] drm/amdgpu: skip kfd resume_process for dev_pm_ops.thaw()

2025-07-04 Thread Zhang, GuoQing (Sam)
On 2025/7/2 22:07, Lazar, Lijo wrote: On 7/2/2025 7:24 PM, Alex Deucher wrote: On Wed, Jul 2, 2025 at 3:24 AM Sam wrote: On 2025/7/2 00:07, Alex Deucher wrote: On Tue, Jul 1, 2025 at 4:32 AM Christian König wrote: On 01.07.25 10:03, Zhang, GuoQing (Sam) wrote: thaw() is called before

Re: [PATCH 1/3] drm/amdgpu: move GTT to SHM after eviction for hibernation

2025-07-01 Thread Zhang, GuoQing (Sam)
From: Koenig, Christian Date: Monday, June 30, 2025 at 19:54 To: Zhang, GuoQing (Sam) , raf...@kernel.org , len.br...@intel.com , pa...@kernel.org , Deucher, Alexander , Limonciello, Mario , Lazar, Lijo Cc: Zhao, Victor , Chang, HaiJun , Ma, Qing (Mark) , amd-gfx@lists.freedesktop.org ,

Re: [PATCH 3/3] drm/amdgpu: skip kfd resume_process for dev_pm_ops.thaw()

2025-07-01 Thread Zhang, GuoQing (Sam)
thaw() is called before writing the hiberation image to swap disk. See the doc here. https://github.com/torvalds/linux/blob/v6.14/Documentation/driver-api/pm/devices.rst?plain=1#L552 And amdgpu implemented thaw() callback by calling amdgpu_device_resume(). https://github.com/torvalds/linux/blob/v

Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Zhang, GuoQing (Sam)
On 2025/5/21 20:00, Christian König wrote: > On 5/21/25 13:55, Zhang, GuoQing (Sam) wrote: >> On 2025/5/21 16:06, Christian König wrote: >>> On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote: >>>>>> +if (amdgpu_virt_xgmi_migrate_enabled(adev)) { >>>&

Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Zhang, GuoQing (Sam)
On 2025/5/21 16:06, Christian König wrote: > On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote: >>>> +if (amdgpu_virt_xgmi_migrate_enabled(adev)) { >>>> +/* set mc->vram_start to 0 to switch the returned GPU address >>>> of >>>&g

Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-19 Thread Zhang, GuoQing (Sam)
On 2025/5/19 21:57, Christian König wrote: > On 5/19/25 10:20, Samuel Zhang wrote: >> When switching to new GPU index after hibernation and then resume, >> VRAM offset of each VRAM BO will be changed, and the cached gpu >> addresses needed to updated. >> >> This is to enable pdb0 and switch to use

Re: [PATCH v5 4/4] drm/amdgpu: fix fence fallback timer expired error

2025-05-19 Thread Zhang, GuoQing (Sam)
4] drm/amdgpu: fix fence fallback timer expired error Regards Sam From: Lazar, Lijo Date: Friday, May 16, 2025 at 18:22 To: Zhang, GuoQing (Sam) , amd-gfx@lists.freedesktop.org Cc: Zhao, Victor , Chang, HaiJun , Koenig, Christian , Deucher, Alexander , Zhang, Owen(SRDC) , Ma, Qing (Mark) Subje

Re: [PATCH v5 0/4] enable xgmi node migration support for hibernate on SRIOV.

2025-05-16 Thread Zhang, GuoQing (Sam)
[AMD Official Use Only - AMD Internal Distribution Only] Hi @Koenig, Christian<mailto:christian.koe...@amd.com> and @Lazar, Lijo<mailto:lijo.la...@amd.com>, Ping… Thanks Sam From: Zhang, Owen(SRDC) Date: Wednesday, May 14, 2025 at 18:07 To: Koenig, Christian , Zhang, GuoQing (Sam

Re: [PATCH v4 1/7] drm/amdgpu: update XGMI info on resume

2025-05-08 Thread Zhang, GuoQing (Sam)
On 2025/5/8 18:56, Christian König wrote: > On 5/8/25 10:12, Lazar, Lijo wrote: >> >> On 5/8/2025 10:39 AM, Samuel Zhang wrote: >>> For virtual machine with vGPUs in SRIOV single device mode and XGMI >>> is enabled, XGMI physical node ids may change when waking up from >>> hiberation with differen

Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error

2025-05-08 Thread Zhang, GuoQing (Sam)
hanged() or just amdgpu_sriov_vf()? Please advise. Thank you! Regards Sam From: amd-gfx on behalf of Zhang, GuoQing (Sam) Date: Thursday, May 8, 2025 at 14:53 To: Chang, HaiJun , Koenig, Christian , Christian König , amd-gfx@lists.freedesktop.org , Deucher, Alexander Cc: Zhao, Victor , Deng,

Re: [PATCH v4 1/7] drm/amdgpu: update XGMI info on resume

2025-05-08 Thread Zhang, GuoQing (Sam)
On 2025/5/8 17:27, Christian König wrote: > > On 5/8/25 07:09, Samuel Zhang wrote: >> For virtual machine with vGPUs in SRIOV single device mode and XGMI >> is enabled, XGMI physical node ids may change when waking up from >> hiberation with different vGPU devices. So update XGMI info on resume. >

Re: [PATCH v4 1/7] drm/amdgpu: update XGMI info on resume

2025-05-08 Thread Zhang, GuoQing (Sam)
On 2025/5/8 16:12, Lazar, Lijo wrote: > On 5/8/2025 10:39 AM, Samuel Zhang wrote: >> For virtual machine with vGPUs in SRIOV single device mode and XGMI >> is enabled, XGMI physical node ids may change when waking up from >> hiberation with different vGPU devices. So update XGMI info on resume. >>

Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error

2025-05-07 Thread Zhang, GuoQing (Sam)
29, 2025 at 10:43 To: Koenig, Christian , Zhang, GuoQing (Sam) , Christian König , amd-gfx@lists.freedesktop.org , Deucher, Alexander Cc: Zhao, Victor , Deng, Emily , Zhang, Owen(SRDC) Subject: RE: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error [AMD Official Use Only - AMD

Re: [PATCH v3 1/7] drm/amdgpu: update XGMI physical node id and GMC configs on resume

2025-05-07 Thread Zhang, GuoQing (Sam)
On 2025/5/7 20:56, Christian König wrote: > On 5/7/25 14:49, Sam wrote: >> On 2025/5/7 20:21, Christian König wrote: >>> On 5/7/25 13:03, Sam wrote: >>>> On 2025/5/7 18:03, Lazar, Lijo wrote: >>>>> On 5/7/2025 11:52 AM, Zhang, GuoQing (Sam) wrote:

Re: [PATCH v3 1/7] drm/amdgpu: update XGMI physical node id and GMC configs on resume

2025-05-06 Thread Zhang, GuoQing (Sam)
acceptable? If not, can you suggest a better approach? @Lazar, Lijo<mailto:lijo.la...@amd.com> @Koenig, Christian<mailto:christian.koe...@amd.com> Thank you! Regards Sam From: Lazar, Lijo Date: Tuesday, May 6, 2025 at 19:55 To: Zhang, GuoQing (Sam) , amd-gfx@lists.freedesktop

Re: [PATCH v2 2/3] drm/amdgpu: update GPU addresses for SMU and PSP

2025-05-06 Thread Zhang, GuoQing (Sam)
e environment to test the changes. Should I remove them as well? @Koenig, Christian<mailto:christian.koe...@amd.com> - psp->fw_pri_mc_addr - psp->fence_buf_mc_addr - psp->km_ring.ring_mem_mc_addr - driver_table->mc_address - pm_status_table->mc_address Thanks Sam From: Koenig,

Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-04-30 Thread Zhang, GuoQing (Sam)
From: Koenig, Christian Date: Monday, April 28, 2025 at 19:30 To: Zhang, GuoQing (Sam) , Christian König , amd-gfx@lists.freedesktop.org , Deucher, Alexander Cc: Zhao, Victor , Chang, HaiJun , Deng, Emily , Zhang, Owen(SRDC) Subject: Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for hibernation on SR

Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error

2025-04-23 Thread Zhang, GuoQing (Sam)
[AMD Official Use Only - AMD Internal Distribution Only] Ping… @Koenig, Christian<mailto:christian.koe...@amd.com> Thanks Sam From: amd-gfx on behalf of Zhang, GuoQing (Sam) Date: Wednesday, April 23, 2025 at 14:59 To: Christian König , amd-gfx@lists.freedesktop.org Cc: Zhao,

Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-04-23 Thread Zhang, GuoQing (Sam)
[AMD Official Use Only - AMD Internal Distribution Only] Ping… @Koenig, Christian<mailto:christian.koe...@amd.com> Thanks Sam From: amd-gfx on behalf of Zhang, GuoQing (Sam) Date: Wednesday, April 23, 2025 at 15:25 To: Christian König , amd-gfx@lists.freedesktop.org Cc: Zhao,

Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-04-23 Thread Zhang, GuoQing (Sam)
2 > To: Zhang, GuoQing (Sam) , > amd-gfx@lists.freedesktop.org > Cc: Zhao, Victor , Chang, HaiJun , > Deng, Emily > Subject: Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for hibernation on SRIOV > Am 14.04.25 um 12:46 schrieb Samuel Zhang: > > When switching to new GPU index aft

Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error

2025-04-22 Thread Zhang, GuoQing (Sam)
at 21:54 To: Zhang, GuoQing (Sam) , amd-gfx@lists.freedesktop.org Cc: Zhao, Victor , Chang, HaiJun , Deng, Emily Subject: Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error Am 14.04.25 um 12:46 schrieb Samuel Zhang: > IH is not working after switching a new gpu index for the f

Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-04-22 Thread Zhang, GuoQing (Sam)
[AMD Official Use Only - AMD Internal Distribution Only] Ping… Thanks Sam From: Zhang, GuoQing (Sam) Date: Friday, April 18, 2025 at 14:26 To: Christian König , amd-gfx@lists.freedesktop.org Cc: Zhao, Victor , Chang, HaiJun , Deng, Emily Subject: Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for

Re: [PATCH 4/6] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-04-17 Thread Zhang, GuoQing (Sam)
nge is to change to the default GPU address from FB aperture type to pdb0 type in this centralized place so that I don’t need to change every callsite of amdgpu_bo_create_reserved(). Could you suggest a better approach if this approach is not acceptable? Thanks Sam From: Christian König Date: Wednesday, April

Re: [PATCH 0/6] enable switching to new gpu index for hibernate on SRIOV.

2025-04-16 Thread Zhang, GuoQing (Sam)
[AMD Official Use Only - AMD Internal Distribution Only] Ping… Regards Sam From: Samuel Zhang Date: Monday, April 14, 2025 at 18:47 To: amd-gfx@lists.freedesktop.org Cc: Zhao, Victor , Chang, HaiJun , Deng, Emily , Zhang, GuoQing (Sam) Subject: [PATCH 0/6] enable switching to new gpu index

Re: [PATCH] drm/amdgpu: fix KFDMemoryTest.PtraceAccessInvisibleVram fail on SRIOV

2024-08-12 Thread Zhang, GuoQing (Sam)
To: amd-gfx@lists.freedesktop.org , Zhang, GuoQing (Sam) , Kim, Jonathan Subject: Re: [PATCH] drm/amdgpu: fix KFDMemoryTest.PtraceAccessInvisibleVram fail on SRIOV On 2024-08-07 04:36, Samuel Zhang wrote: > Ptrace access VRAM bo will first try sdma access in > amdgpu_ttm_access_memory_sdma(),

Re: [PATCH] drm/amdgpu: fix KFDMemoryTest.PtraceAccessInvisibleVram fail on SRIOV

2024-08-08 Thread Zhang, GuoQing (Sam)
To: amd-gfx@lists.freedesktop.org Cc: Zhang, GuoQing (Sam) Subject: [PATCH] drm/amdgpu: fix KFDMemoryTest.PtraceAccessInvisibleVram fail on SRIOV Ptrace access VRAM bo will first try sdma access in amdgpu_ttm_access_memory_sdma(), if fails, it will fallback to mmio access. Since ptrace only access 8 bytes a

Re: [PATCH 2/2] drm/amd/amdgpu: use the default reset for ras recovery

2024-05-06 Thread Zhang, GuoQing (Sam)
eng, Kenneth Date: Monday, April 29, 2024 at 16:15 To: Feng, Kenneth , amd-gfx@lists.freedesktop.org , Zhang, GuoQing (Sam) Cc: Zhang, Owen(SRDC) , Aldabagh, Maad , Ma, Qing (Mark) Subject: RE: [PATCH 2/2] drm/amd/amdgpu: use the default reset for ras recovery [AMD Official Use Only - Gener

Re: [PATCH 1/2] drm/amd/amdgpu: customized the reset to skip soft recovery

2024-05-06 Thread Zhang, GuoQing (Sam)
also needed when we test mode2 reset using quark tool. Thanks Sam From: Feng, Kenneth Date: Monday, April 29, 2024 at 16:14 To: Feng, Kenneth , amd-gfx@lists.freedesktop.org , Zhang, GuoQing (Sam) Cc: Zhang, Owen(SRDC) , Aldabagh, Maad , Ma, Qing (Mark) Subject: RE: [PATCH 1/2] dr