[PATCH] drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1

2024-07-04 Thread Zhigang Luo
to avoid reading wrong WPTR from doorbell in sriov vf, set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 3 +++ drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 3 +++ 2 files changed, 6

[PATCH] drm/amdgpu: avoid reading vf2pf info size from FB

2024-04-30 Thread Zhigang Luo
VF can't access FB when host is doing mode1 reset. Using sizeof to get vf2pf info size, instead of reading it from vf2pf header stored in FB. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/driver

[PATCH] drm/amdgpu: update vf to pf message retry from 2 to 5

2024-04-30 Thread Zhigang Luo
increase retry times to wait host has enough time to complete reset. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index

[PATCH] drm/amdgpu: remove virt_init_data_exchange from poison consumption handler

2024-04-17 Thread Zhigang Luo
Host will initiate an FLR for all poison consumption. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm

[PATCH 2/2] amd/amdgpu: improve VF recover time

2024-04-03 Thread Zhigang Luo
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 +- 3 files changed, 3

[PATCH 1/2] amd/amdkfd: sync all devices to wait all processes being evicted

2024-04-03 Thread Zhigang Luo
recover, it will be restored, then caused page fault. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index

[PATCH 2/2] amd/amdgpu: improve VF recover time

2024-04-03 Thread Zhigang Luo
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 +- 3 files changed, 3

[PATCH 1/2] amd/amdkfd: sync all devices to wait all processes being evicted

2024-04-03 Thread Zhigang Luo
recover, it will be restored, then caused page fault. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index

[PATCH 2/2] amd/amdgpu: improve VF recover time

2024-04-01 Thread Zhigang Luo
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Change-Id: If1e0357deffa4549d4e83e925c8d764f7f8c9f42 Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 + drivers/gpu/drm

[PATCH 1/2] amd/amdkfd: sync all devices to wait all processes being evicted

2024-04-01 Thread Zhigang Luo
recover, it will be restored, then caused page fault. Signed-off-by: Zhigang Luo Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu

[PATCH 3/3] amd/amdgpu: improve VF recover time

2024-03-25 Thread Zhigang Luo
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Change-Id: If1e0357deffa4549d4e83e925c8d764f7f8c9f42 Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 + drivers/gpu/drm

[PATCH 2/3] amd/amdgpu: wait no process running in kfd before resuming device

2024-03-25 Thread Zhigang Luo
it will cause page fault after device recovered if there is a process running. Signed-off-by: Zhigang Luo Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/3] amd/amdkfd: add a function to wait no process running in kfd

2024-03-25 Thread Zhigang Luo
Signed-off-by: Zhigang Luo Change-Id: I2a98d513c26107ac76ecf20e951c188afbc7ede6 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 20 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c| 11 +++ 3 files changed, 40

[PATCH 3/3] amd/amdgpu: improve VF recover time

2024-03-22 Thread Zhigang Luo
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Change-Id: If1e0357deffa4549d4e83e925c8d764f7f8c9f42 Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 + drivers/gpu/drm

[PATCH 2/3] amd/amdgpu: wait no process running in kfd before resuming device

2024-03-22 Thread Zhigang Luo
it will cause page fault after device recovered if there is a process running. Signed-off-by: Zhigang Luo Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/3] amd/amdkfd: add a function to wait no process running in kfd

2024-03-22 Thread Zhigang Luo
Signed-off-by: Zhigang Luo Change-Id: I2a98d513c26107ac76ecf20e951c188afbc7ede6 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 20 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 - drivers/gpu/drm/amd/amdkfd/kfd_device.c| 11 +++ 3 files changed, 35

[PATCH] drm/amdgpu: trigger flr_work if reading pf2vf data failed

2024-03-17 Thread Zhigang Luo
if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue. also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout. Signed-off-by: Zhigang

[PATCH] drm/amdgpu: trigger flr_work if reading pf2vf data failed

2024-03-14 Thread Zhigang Luo
if reading pf2vf data failed 5 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue. also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout. Signed-off-by: Zhigang

[PATCH 2/3] drm/amdgpu: Add RAS_POISON_READY host response message

2024-01-24 Thread Zhigang Luo
From: Victor Skvortsov In a non-FLR page avoidance scenario, the host driver will provide the bad pages in the pf2vf exchange region. Adding a new host response message to indicate when the pf2vf exchange region has been updated. Signed-off-by: Victor Skvortsov Change-Id: I58d5d11d959d91ad5723

[PATCH 3/3] amdgpu/drm: Use vram manager for virtualization page retirement

2024-01-24 Thread Zhigang Luo
From: Victor Skvortsov In runtime, use vram manager for virtualization page retirement. Signed-off-by: Victor Skvortsov Change-Id: Ia8fe6c7d4e4acae9d3a953b3ba4567e8fc6de0fa --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 30 1 file changed, 20 insertions(+), 10 deletion

[PATCH 1/3] drm/amdgpu: Support passing poison consumption ras block to SRIOV

2024-01-24 Thread Zhigang Luo
From: YiPeng Chai Support passing poison consumption ras blocks to SRIOV. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- drivers/gpu/drm/amd/am

[PATCH 2/2] drm/amdgpu: init TA microcode for SRIOV VF when MP0 IP is 13.0.6

2023-07-06 Thread Zhigang Luo
Signed-off-by: Zhigang Luo Change-Id: I71524c69c7137c6db4968b95e480c910aba24703 --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 21438ff61c6e..de9a2a7f5459

[PATCH 1/2] drm/amdgpu: remove SRIOV VF FB location programming

2023-07-06 Thread Zhigang Luo
For SRIOV VF, FB location is programmed by host driver, no need to program it in guest driver. Signed-off-by: Zhigang Luo Change-Id: I2a4838f6703e94bb0bcf3a8e923c69466e37803f --- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 15 +-- drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 12

[PATCH 2/2] drm/amdgpu: port SRIOV VF missed changes

2023-06-15 Thread Zhigang Luo
port SRIOV VF missed changes from gfx_v9_0 to gfx_v9_4_3. Signed-off-by: Zhigang Luo Change-Id: Id580820376c8d653e9ec5ebf5a8b950cd0a67e1a --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: Skip TMR for MP0_HWIP 13.0.6

2023-06-15 Thread Zhigang Luo
For SRIOV VF, no TMR needed. Signed-off-by: Zhigang Luo Change-Id: If9556cf60dfcbd95e102b1387cf233e902d9490e --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index

[PATCH 4/4] drm/amdgpu: extended waiting SRIOV VF reset completion timeout to 10s

2021-12-07 Thread Zhigang Luo
For the ASIC has big FB, it need more time to clear FB during reset. This change extended SRIOV VF waiting reset completion timeout from 5s to 10s. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers

[PATCH 3/4] drm/amdgpu: recover XGMI topology for SRIOV VF after reset

2021-12-07 Thread Zhigang Luo
For SRIOV VF, the XGMI topology was not recovered after reset. This change added code to SRIOV VF reset function to update XGMI topology for SRIOV VF after reset. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++--- 1 file changed, 14 insertions

[PATCH 2/4] drm/amdgpu: initialize XGMI for SRIOV VF during recover

2021-12-07 Thread Zhigang Luo
For SIORV VF, XGMI was not initialized during recover. This change added XGMI initialization for SRIOV VF during recover. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/4] drm/amdgpu: skip reset other device in the same hive if it's SRIOV VF

2021-12-07 Thread Zhigang Luo
notification before the real hive reset been executed. The VF device can handle the reset request individually in it's reset work handler. This change updated gpu recover sequence to skip reset other device in the same hive for SRIOV VF. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/a

[PATCH] drm/amdgpu: skip reset other device in the same hive if it's sriov vf

2021-12-03 Thread Zhigang Luo
For sriov vf hang, vf flr will be triggered. Hive reset is not needed. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: correct MMSCH 1.0 version

2021-08-16 Thread Zhigang Luo
MMSCH 1.0 doesn't have major/minor version, only verison. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/mmsch_v1_0.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mmsch_v1_0.h b/drivers/gpu/drm/amd/amdgpu/mmsch_v1_0.h

[PATCH] drm/amdgpu: correct MMSCH version

2021-08-12 Thread Zhigang Luo
MMSCH doesn't have major/minor version, only verison. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/mmsch_v1_0.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mmsch_v1_0.h b/drivers/gpu/drm/amd/amdgpu/mmsch_v1_0.h

[PATCH 4/5] drm/amdgpu: add psp ta microcode init for aldebaran sriov vf

2021-06-07 Thread Zhigang Luo
need to load xgmi ta for aldebaran sriov vf. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 47ceb783e2a5..29c365160043

[PATCH 3/5] drm/amdgpu: remove sriov vf mmhub system aperture and fb location programming

2021-06-03 Thread Zhigang Luo
host driver programmed mmhub system aperture and fb location for vf, no need to program in guest side. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 17 +++-- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/5] drm/amdgpu: remove sriov vf checking from getting fb location

2021-06-03 Thread Zhigang Luo
host driver programmed fb location registers for vf, no need to check anymore. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 5/5] drm/amdgpu: allocate psp fw private buffer from VRAM for sriov vf

2021-06-03 Thread Zhigang Luo
psp added new feature to check fw buffer address for sriov vf. the address range must be in vf fb. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 4/5] drm/amdgpu: add psp microcode init for arcturus and aldebaran sriov vf

2021-06-03 Thread Zhigang Luo
need to load xgmi ta for arcturus and aldebaran sriov vf. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index

[PATCH 2/5] drm/amdgpu: remove sriov vf gfxhub fb location programming

2021-06-03 Thread Zhigang Luo
host driver programmed the gfxhub fb location for vf, no need to program in guest side. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 12 1 file changed, 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: Add Aldebaran virtualization support

2021-04-29 Thread Zhigang Luo
1. add Aldebaran in virtualization detection list. 2. disable Aldebaran virtual display support as there is no GFX engine in Aldebaran. 3. skip TMR loading if Aldebaran is in virtualizatin mode as it shares the one host loaded. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdkfd: Add Aldebaran virtualization support

2021-04-29 Thread Zhigang Luo
update kfd_supported_devices to enable Aldebaran virtualization support Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

[PATCH 1/1] drm/amdgpu: Add a new device ID for Aldebaran

2021-04-29 Thread Zhigang Luo
It is Aldebaran VF device ID, for virtualization support. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 0369d3532bf0

[PATCH] drm/amdgpu: add SOS FW version checking for CAP

2020-03-24 Thread Zhigang Luo
To make sure the CAP feature is supported by the SOS, add SOS FW version checking before loading the CAP FW. Change-Id: I7aa1c09f9c117f67ede0db6cd5911d56c8568495 Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 + 1 file changed, 5 insertions(+) diff --git a

[PATCH] drm/amdgpu: add CAP fw loading

2020-02-26 Thread Zhigang Luo
The CAP fw is for enabling driver compatibility. Currently, it only enabled for vega10 VF. Signed-off-by: Zhigang Luo --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 9 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 3 ++- drivers/gpu