[PATCH] drm/amd/amdkfd: add/remove kfd queues through on stop/start KFD scheduling

2024-10-16 Thread shaoyunl
Add back kfd queues in start scheduling that originally been removed on stop scheduling. Signed-off-by: shaoyunl --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 40 +-- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdgpu: Increase MES log buffer to dump mes scratch data

2024-10-10 Thread shaoyunl
MES internal scratch data is useful for mes debug, it can only located in VRAM, change the allocation type and increase size for mes 11 Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: add/remove kfd queues through on stop/start KFD scheduling

2024-10-04 Thread shaoyunl
Add back kfd queues in start scheduling that originally been removed on stop scheduling. Signed-off-by: shaoyunl --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 +-- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdgpu: enable unmapped doorbell handling basic mode on mes 12

2024-05-08 Thread shaoyunl
This reverts commit 9606c08e178f953d22e50b05c64b4b1a48051f3e. Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c| 14 ++ drivers/gpu/drm/amd/include/mes_v12_api_def.h | 3 ++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

[PATCH] drm/amdgpu : Add mes_log_enable to control mes log feature

2024-03-22 Thread shaoyunl
The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 ++ drivers

[PATCH] drm/amdgpu : Increase the mes log buffer size as per new MES FW version

2024-03-22 Thread shaoyunl
>From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + 2 files changed, 3 insertions(+)

[PATCH] drm/amdgpu : Add mes_log_enable to control mes log feature

2024-03-22 Thread shaoyunl
The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5

[PATCH] drm/amdgpu: Only create mes event log debugfs when mes is enabled

2024-01-31 Thread shaoyunl
Skip the debugfs file creation for mes event log if the GPU doesn't use MES. This to prevent potential kernel oops when user try to read the event log in debugfs on a GPU without MES Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 6 +++--- 1 file changed, 3 inser

[PATCH] drm/amdgpu: Enable event log on MES 11

2023-11-23 Thread shaoyunl
Enable event log through the HW specific FW API Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 2 ++ drivers/gpu/drm/amd/include/mes_v11_api_def.h | 1 + 2 files changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd

[PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread shaoyunl
This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 61

[PATCH] drm/amdgpu: SW part of MES event log enablement

2023-11-23 Thread shaoyunl
This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 61

[PATCH] drm/amdgpu: Enable MES to handle doorbell ring on unmapped queue

2023-11-02 Thread shaoyunl
On navi4x and up, HW can monitor up to 2048 doorbells that not be mapped currently and trigger the interrupt to MES when these unmapped doorbell been ringed. Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 24 1 file changed, 24 insertions(+) diff

[PATCH] drm/amdgpu: Use per device reset_domain for XGMI on sriov configuration

2022-09-07 Thread shaoyunl
ice reset_domain Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 36 +- 2 files changed, 33 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/driv

[PATCH] drm/amdgpu: Remove the additional kfd pre reset call for sriov

2022-08-18 Thread shaoyunl
The additional call is caused by merge conflict Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 4cd87dbb108c..d7eb23b8d692

[PATCH] drm/amdgpu: use sjt mec fw on aldebaran for sriov

2022-08-05 Thread shaoyunl
different version of MEC as long as they support sjt Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index

[PATCH] drm/amdgpu: Disable FRU EEPROM access for SRIOV

2022-01-20 Thread shaoyunl
VF acces the EEPROM is blocked by security policy, we might need other way to get SKUs info for VF Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c b/drivers/gpu

[PATCH] drm/amdgpu: adjust the kfd reset sequence in reset sriov function

2021-11-29 Thread shaoyunl
inside reset_sriov function. Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 1989f9e9379e

[PATCH] drm/amd/amdgpu: move kfd post_reset out of reset_sriov function

2021-11-18 Thread shaoyunl
reset_sriov function to make them balance . Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 10c8008d1da0

[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again

2021-11-15 Thread shaoyunl
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch already been called, the start_cpsch will not be called since there is no resume in this case. When reset been triggered again, driver should avoid to do uninitialization again. Signed-off-by: shaoyunl

[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again

2021-11-14 Thread shaoyunl
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch already been called, the start_cpsch will not be called since there is no resume in this case. When reset been triggered again, driver should avoid to do uninitialization again. Signed-off-by: shaoyunl

[PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again

2021-11-14 Thread shaoyunl
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch already been called, the start_cpsch will not be called since there is no resume in this case. When reset been triggered again, driver should avoid to do uninitialization again. Signed-off-by: shaoyunl

[PATCH] drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov

2021-11-05 Thread shaoyunl
The KFD pre_reset should be called before reset been executed, it will hold the lock to prevent other rocm process to sent the packlage to hiq during host execute the real reset on the HW Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 + 1 file changed, 1

[PATCH] drm/amd/amdkfd: Don't sent command to HWS on kfd reset

2021-11-04 Thread shaoyunl
termination. Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm

[PATCH] drm/amd/amdkfd: Don't sent command to HWS on kfd reset

2021-11-03 Thread shaoyunl
termination. Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +- 4 files

[PATCH] drm/amdgpu: Get atomicOps info from Host for sriov setup

2021-09-10 Thread shaoyunl
Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 +++-- drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 4 +++- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Get atomicOps info from Host for sriov setup

2021-09-10 Thread shaoyunl
Signed-off-by: shaoyunl Change-Id: Ifdbcb4396d64e3f3cbf6bcbf7ab9c7b2cb061052 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 +++-- drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 4 +++- 2 files changed, 16 insertions(+), 12 deletions(-) mode change 100644 => 100755 drivers/

[PATCH] drm/amdgpu: Get atomicOps info from Host for sriov setup

2021-09-10 Thread shaoyunl
Signed-off-by: shaoyunl Change-Id: Ifdbcb4396d64e3f3cbf6bcbf7ab9c7b2cb061052 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 - drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 4 +++- 2 files changed, 17 insertions(+), 12 deletions(-) mode change 100644 => 100755 drivers/

[PATCH] drm/amdgpu: Get atomicOps info from Host for sriov setup

2021-09-09 Thread shaoyunl
Signed-off-by: shaoyunl Change-Id: Ifdbcb4396d64e3f3cbf6bcbf7ab9c7b2cb061052 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 ++-- drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 4 +++- 2 files changed, 21 insertions(+), 3 deletions(-) mode change 100644 => 100755 drivers/gpu/

[PATCH] drm/amdgpu: soc15 register access through RLC should only apply to sriov runtime

2021-06-01 Thread shaoyunl
On SRIOV, driver should only access register through RLC in runtime Signed-off-by: shaoyunl Change-Id: Iecaa52436a2985a18ede9c86cb00cc197a717bd6 --- drivers/gpu/drm/amd/amdgpu/soc15_common.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amd/pm: Use BACO reset arg 0 on XGMI configuration

2021-03-15 Thread shaoyunl
With arg 1 BACO reset, it will try to reload the SMU FW after reset. This might failed if driver already in a pending reset status during probe period. Arg 0 reset will bring asic back to a clean state and driver will re-init everythign including SMU FW Signed-off-by: shaoyunl Change-Id

[PATCH 1/2] drm/amdgpu: Keep pending_reset valid during smu reset the ASIC

2021-03-15 Thread shaoyunl
SMU internal might need to check this pending_reset setting to decide the reset method Signed-off-by: shaoyunl Change-Id: I8d88abf56d481e7443ac31baa2929826aec9e576 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm

[PATCH] drm/amdgpu: Enable light SBR in XGMI+passthrough configuration

2021-03-12 Thread shaoyunl
This is to fix the commit dda9bbb26c7 where it only enable the light SMU on normal device init. This feature actually need to be enabled after ASIC been reset as well. Signed-off-by: shaoyunl Change-Id: Ie7ee02cd3ccdab3522aad9a02f681963e211ed44 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9

[PATCH] drm/amdgpu: Enable light SBR in XGMI+passthrough configuration

2021-03-11 Thread shaoyunl
This is to fix the commit dda9bbb26c7 where it only enable the light SMU on normal device init. This feature actually need to be enabled after ASIC been reset as well. Signed-off-by: shaoyunl Change-Id: Ie7ee02cd3ccdab3522aad9a02f681963e211ed44 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7

[PATCH 1/2] drm/amd/pm: Add LightSBR SMU MSG support

2021-03-10 Thread shaoyunl
request to PSP for a HW reset Signed-off-by: shaoyunl Change-Id: I5f0e48730d2b4b48fed8137aa57c683d5b3d1b9f --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 7 +++ drivers/gpu/drm/amd/pm/inc/arcturus_ppsmc.h | 7 +++ drivers/gpu/drm/amd/pm/inc/smu_types.h| 1

[PATCH 2/2] drm/amdgpu: Enable light SBR for SMU on passthrough and XGMI configuration

2021-03-10 Thread shaoyunl
SMU introduce the new interface to enable light Secondary Bus Reset mode, driver enable it on passthrough + XGMI configuration Signed-off-by: shaoyunl Change-Id: I59aef0559aba418b764e7cf716b0d98aca14fec5 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 1 file changed, 4 insertions

[PATCH 1/2] drm/amd/pm: Add LightSBR SMU MSG support

2021-03-10 Thread shaoyunl
request to PSP for a HW reset Signed-off-by: shaoyunl Change-Id: I5f0e48730d2b4b48fed8137aa57c683d5b3d1b9f --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 7 +++ drivers/gpu/drm/amd/pm/inc/arcturus_ppsmc.h | 7 +++ drivers/gpu/drm/amd/pm/inc/smu_types.h| 1

[PATCH 2/2] drm/amdgpu: Enable lightSBR for SMU on passthrough and XGMI configuration

2021-03-10 Thread shaoyunl
SMU introduce the new interface to enable lightSBR mode, driver enable it on passthrough + XGMI configuration Signed-off-by: shaoyunl Change-Id: I59aef0559aba418b764e7cf716b0d98aca14fec5 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 1 file changed, 4 insertions(+) diff --git a

[PATCH 1/2] drm/amd/pm: Add LightSBR SMU MSG support

2021-03-10 Thread shaoyunl
request to PSP for a HW reset Signed-off-by: shaoyunl Change-Id: I5f0e48730d2b4b48fed8137aa57c683d5b3d1b9f --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 7 +++ drivers/gpu/drm/amd/pm/inc/arcturus_ppsmc.h | 7 +++ drivers/gpu/drm/amd/pm/inc/smu_types.h| 1

[PATCH] drm/amdgpu: skip read eeprom for device that pending on XGMI reset

2021-03-09 Thread shaoyunl
Read eeprom through SMU doesn't works stable on XGMI reset during test. skip it for now Signed-off-by: shaoyunl Change-Id: Id864b96a9da5b0d4dd5ffef9858997dd9f52de25 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/a

[PATCH] drm/amdgpu : Fix asic reset regression issue introduce by 3f61aa92b88c

2021-03-09 Thread shaoyunl
This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl Change-Id: I595998b107f48865f47820ba2e7f758cc263dc64 --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- 1 f

[PATCH 5/5] drm/amdgpu: Reset the devices in the XGMI hive duirng probe

2021-03-06 Thread shaoyunl
the same time Signed-off-by: shaoyunl Change-Id: I34e838e611b7623c7ad824704c7ce350808014fc --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 13 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 102 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 69 ++ driver

[PATCH 4/5] drm/amdgpu: Add reset_list for device list used for reset

2021-03-06 Thread shaoyunl
The gmc.xgmi.head list originally is designed for device list in the XGMI hive. Mix use it for reset purpose will prevent the reset function to adjust XGMI device list which is required in next change Signed-off-by: shaoyunl Change-Id: Ibbdf75c02836151adf5bb44186e6ced97dbf8c1d --- drivers/gpu

[PATCH 5/5] drm/amdgpu: Reset the devices in the XGMI hive duirng probe

2021-03-05 Thread shaoyunl
the same time Signed-off-by: shaoyunl Change-Id: I34e838e611b7623c7ad824704c7ce350808014fc --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 13 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 102 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 71 ++ driver

[PATCH 4/5] drm/amdgpu: Add reset_list for device list used for reset

2021-03-05 Thread shaoyunl
The gmc.xgmi.head list originally is designed for device list in the XGMI hive. Mix use it for reset purpose will prevent the reset function to adjust XGMI device list which is required in next change Signed-off-by: shaoyunl Change-Id: Ibbdf75c02836151adf5bb44186e6ced97dbf8c1d --- drivers/gpu

[PATCH 3/5] drm/amdgpu: Init the cp MQD if it's not be initialized before

2021-03-05 Thread shaoyunl
The MQD might not be initialized duirng first init period if the device need to be reset druing probe. Driver need to proper init them in gpu recovery period Signed-off-by: shaoyunl Acked-by: Alex Deucher Change-Id: Iad58a050939af2afa46d1c74a90866c47ba9efd2 --- drivers/gpu/drm/amd/amdgpu

[PATCH 2/5] drm/amdgpu: Add kfd init_complete flag to check from amdgpu side

2021-03-05 Thread shaoyunl
amdgpu driver may be in reset state during init which will not initialize the kfd, driver need to initialize the KFD after reset by check the flag Signed-off-by: shaoyunl Acked-by: Alex Deucher Change-Id: Ic1684b55b27e0afd42bee8b9b431c4fb0afcec15 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

[PATCH 1/5] drm/amdgpu: get xgmi info at eary_init

2021-03-05 Thread shaoyunl
Driver need to get XGMI info function earlier before ip_init since driver need to check the XGMI setting to determine how to perform reset during init Signed-off-by: shaoyunl Acked-by: Alex Deucher Change-Id: Ic37276bbb6640bb4e9360220fed99494cedd3ef5 --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c

[PATCH 4/4] drm/amdgpu: Reset the devices in the XGMI hive duirng probe

2021-02-23 Thread shaoyunl
the same time with existing gpu_recovery routine. Signed-off-by: shaoyunl Change-Id: I34e838e611b7623c7ad824704c7ce350808014fc --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 96 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_

[PATCH 3/4] drm/amdgpu: Init the cp MQD if it's not be initialized before

2021-02-23 Thread shaoyunl
The MQD might not be initialized duirng first init period if the device need to be reset druing probe. Driver need to proper init them in gpu recovery period Signed-off-by: shaoyunl Acked-by: Alex Deucher Change-Id: Iad58a050939af2afa46d1c74a90866c47ba9efd2 --- drivers/gpu/drm/amd/amdgpu

[PATCH 2/4] drm/amdgpu: Add kfd init_complete flag to check from amdgpu side

2021-02-23 Thread shaoyunl
amdgpu driver may be in reset state during init which will not initialize the kfd, driver need to initialize the KFD after reset by check the flag Signed-off-by: shaoyunl Acked-by: Alex Deucher Change-Id: Ic1684b55b27e0afd42bee8b9b431c4fb0afcec15 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

[PATCH 1/4] drm/amdgpu: get xgmi info at eary_init

2021-02-23 Thread shaoyunl
Driver need to get XGMI info function earlier before ip_init since driver need to check the XGMI setting to determine how to perform reset during init Signed-off-by: shaoyunl Acked-by: Alex Deucher Change-Id: Ic37276bbb6640bb4e9360220fed99494cedd3ef5 --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c

[PATCH 4/4] drm/amdgpu: Init the cp MQD if it's not be initialized before

2021-02-18 Thread shaoyunl
The MQD might not be initialized duirng first init period if the device need to be reset druing probe. Driver need to proper init them in gpu recovery period Signed-off-by: shaoyunl Change-Id: Iad58a050939af2afa46d1c74a90866c47ba9efd2 --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 20

[PATCH 3/4] drm/amdgpu: Add kfd init_complete flag to check from amdgpu side

2021-02-18 Thread shaoyunl
amdgpu driver may in reset state duirng init which will not initialize the kfd, driver need to initialize the KFD after reset by check the flag Signed-off-by: shaoyunl Change-Id: Ic1684b55b27e0afd42bee8b9b431c4fb0afcec15 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 3 ++- drivers/gpu/drm

[PATCH 2/4] drm/amdgpu: get xgmi info at eary_init

2021-02-18 Thread shaoyunl
Driver need to get XGMI info function earlier before ip_init since driver need to check the XGMI setting to determine how to perform reset during init Signed-off-by: shaoyunl Change-Id: Ic37276bbb6640bb4e9360220fed99494cedd3ef5 --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 -- 1 file

[PATCH 1/4] drm/amdgpu: Reset the devices in the XGMI hive duirng probe

2021-02-18 Thread shaoyunl
the same time with existing gpu_recovery routine. Signed-off-by: shaoyunl Change-Id: I34e838e611b7623c7ad824704c7ce350808014fc --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 96 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_

[PATCH] drm/amdgpu/dce_virtual: Enable vBlank control for vf

2020-11-23 Thread shaoyunl
This function actually control the vblank on/off. It shouldn't be bypassed for VF. Otherwise all the vblank based feature on VF will not work. Signed-off-by: shaoyunl Change-Id: I77c6f57bb0af390b61f0049c12bf425b10d70d91 --- drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 3 --- 1 file ch

[PATCH] drm/amdgpu/dce_virtual: Enable DPM for vf

2020-11-23 Thread shaoyunl
This function actually control the vblank on/off. It shouldn't be bypassed for VF. Otherwise all the vblank based feature on VF will not work. Signed-off-by: shaoyunl Change-Id: I77c6f57bb0af390b61f0049c12bf425b10d70d91 --- drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 3 --- 1 file ch

[PATCH] drm/amdgpu/dce_virtual: Enable DPM for vf

2020-11-23 Thread shaoyunl
This function actually control the vblank on/off. It shouldn't be bypassed for VF. Otherwise all the vblank based feature on VF will not work. Signed-off-by: shaoyunl Change-Id: I77c6f57bb0af390b61f0049c12bf425b10d70d91 --- drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 3 --- 1 file ch

[PATCH] drm/amdgpu/sriov : Don't resume RLCG for SRIOV guest

2020-03-17 Thread shaoyunl
RLCG is enabled by host driver, no need to enable it in guest for none-PSP load path Change-Id: I2f313743bf3d492f06aaef07224da6eda3878a28 Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

Re: [PATCH 3/3] drm/amdgpu: Improve Vega20 XGMI TLB flush workaround

2020-01-20 Thread shaoyunl
I see.  So this change Reviewed-by: shaoyun liu On 2020-01-20 1:40 p.m., Felix Kuehling wrote: On 2020-01-20 1:28 p.m., shaoyunl wrote: On 2020-01-20 12:58 p.m., Felix Kuehling wrote: On 2020-01-20 12:47 p.m., shaoyunl wrote: comments in line . On 2020-01-17 8:37 p.m., Felix Kuehling

Re: [PATCH 3/3] drm/amdgpu: Improve Vega20 XGMI TLB flush workaround

2020-01-20 Thread shaoyunl
On 2020-01-20 12:58 p.m., Felix Kuehling wrote: On 2020-01-20 12:47 p.m., shaoyunl wrote: comments in line . On 2020-01-17 8:37 p.m., Felix Kuehling wrote: Using a heavy-weight TLB flush once is not sufficient. Concurrent memory accesses in the same TLB cache line can re-populate TLB entries

Re: [PATCH 3/3] drm/amdgpu: Improve Vega20 XGMI TLB flush workaround

2020-01-20 Thread shaoyunl
* still need a second TLB flush after this. +*/ + inv_req = gmc_v9_0_get_invalidate_req(vmid, 2); + inv_req2 = gmc_v9_0_get_invalidate_req(vmid, flush_type); [shaoyunl]  For the send invalidation in this situation ,can we use 0  for the flush type d

Re: [PATCH] drm/amdgpu: check rlc_g firmware pointer is valid before using it

2020-01-13 Thread shaoyunl
ping. On 2020-01-10 1:33 p.m., shaoyunl wrote: In SRIOV, rlc_g firmware is loaded by host, guest driver won't load it which will cause the rlc_fw pointer is null Change-Id: Id16f65171dd427d623af4c5bc75f674019e63dec Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.

[PATCH] drm/amdgpu: check rlc_g firmware pointer is valid before using it

2020-01-10 Thread shaoyunl
In SRIOV, rlc_g firmware is loaded by host, guest driver won't load it which will cause the rlc_fw pointer is null Change-Id: Id16f65171dd427d623af4c5bc75f674019e63dec Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 9 + 1 file changed, 5 insertions(+), 4 dele

Re: [PATCH 4/4] drm/amdkfd: Avoid hanging hardware in stop_cpsch

2019-12-20 Thread shaoyunl
019-12-20 2:46 p.m., Felix Kuehling wrote: On 2019-12-20 14:31, shaoyunl wrote: Can we use the  dqm_lock when we try to get the dqm->is_hw_hang and dqm->is_resetting inside function kq_uninitialize ? Spreading the DQM lock around is probably not a good idea. Then I'd rather do more re

Re: [PATCH 4/4] drm/amdkfd: Avoid hanging hardware in stop_cpsch

2019-12-20 Thread shaoyunl
ate the kernel queue in the DQM initialize function because dev->dqm isn't initialized at that time yet. Regards,   Felix On 2019-12-20 10:56, shaoyunl wrote: Looks like patch 2 is not related to this serial , but anyway . Patch 1,2,3 are reviewed by shaoyunl For patch 4 ,  is it poss

Re: [PATCH 4/4] drm/amdkfd: Avoid hanging hardware in stop_cpsch

2019-12-20 Thread shaoyunl
Looks like patch 2 is not related to this serial , but anyway . Patch 1,2,3 are reviewed by shaoyunl  For patch 4 ,  is it possible we directly check dqm->is_hws_hang || dqm->is_resetting  inside function kq_uninitialize.  so we don't need other interface change . I think even

Re: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread shaoyunl
s on things that aren't expected to complete anyway. Regards,   Felix On 2019-12-19 11:59 a.m., shaoyunl wrote: After check the code , in KFD side , should be simple just add the check in stop_cpsch code . For kiq, there is no return for WREG32 , so no easy way to check the return

Re: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread shaoyunl
don't have this issue But the reason we cannot call it before VF FLR on SRIOV case was already stated in this thread Thanks _ Monk Liu|GPU Virtualization Team |AMD -Original Message- From: Liu, Monk Sent: Thursday, December 19, 2019 11:4

Re: [PATCH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-17 Thread shaoyunl
I think amdkfd side depends on this call to stop the user queue, without this call, the user queue can submit to HW during the reset which could cause hang again ... Do we know the root cause why this function would ruin MEC ? From the logic, I think this function should be called before FLR si

[PATCH] drm/amdgpu: Init correct fb region for none XGMI configuration

2018-09-10 Thread shaoyunl
Fix : 5c777a5 'Adjust GART and AGP location with xgmi offset' Change-Id: I2d78024fbe44a37f46a35d34c1e64dbd3937fdf1 Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/d

[PATCH] drm/amdgpu: Init correct fb region for none XGMI configuration

2018-09-10 Thread shaoyunl
Change-Id: I2d78024fbe44a37f46a35d34c1e64dbd3937fdf1 Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index cf97c1c..ae44671 100644 --- a

[PATCH] drm/amdkfd: Only add bi-directional iolink on GPU with XGMI or largebar

2018-09-07 Thread shaoyunl
Change-Id: Ibb6a89ed878fffccb9a8bb4032b07a10ee298a99 Signed-off-by: shaoyunl --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 15 +-- drivers/gpu/drm/amd/amdkfd/kfd_crat.h | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + 3 files changed, 12 insertions(+), 7 deletions(-) diff --git

[PATCH 06/12] drm/amdgpu: Add place holder functions for xgmi topology interface with psp

2018-09-07 Thread shaoyunl
From: Shaoyun Liu Add dummy function for xgmi function interface with psp Change-Id: I01f35baf5a4b96e9654d448c9892be3cd72c05b7 Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 29 + 1 file changed, 29 insertions(+)

[PATCH 04/12] drm/amdgpu/gmc9: Adjust GART and AGP location with xgmi offset

2018-09-07 Thread shaoyunl
From: Alex Deucher On hives with xgmi enabled, the fb_location aperture is a size which defines the total framebuffer size of all nodes in the hive. Each GPU in the hive has the same view via the fb_location aperture. GPU0 starts at offset (0 * segment size), GPU1 starts at offset (1 * segment

[PATCH 13/13] drm/amdkfd: Generate xGMI direct iolink

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Generate xGMI iolink for upper level usage Change-Id: I37bc29fee45cb10d1da849956055c59d823f6f5d Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 78 ++- 1 file changed, 68 insertions(+), 10 del

[PATCH 12/13] drm/amdkfd: Add new iolink type defines

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Update the iolink type defines according to the new thunk spec Change-Id: Ie155641b6bfbe005ae0e12c5c31c68157247ea26 Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_crat.h | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) d

[PATCH 11/13] drm/amdkfd: kfd expose the hive_id of the device through its node properties

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Thunk will generate the XGMI topology information when necessary with the hive_id for each specified device Change-Id: I3bbc37bd2af4295e24357ce82f2c760162aff9ca Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 +++ dri

[PATCH 10/13] drm/amdgpu: get_hive_id from amdgpu side

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Retrieve hive_id from amdgpu device Change-Id: I9bb4d87870edf638b477a9088f14bc84b70e71e2 Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + drivers

[PATCH 09/13] drm/amd/include: Add get_hive_id interface in kfd2kgd

2018-09-05 Thread shaoyunl
From: Shaoyun Liu KFD need to get hive id from amdgpu to build up the XGMI topology Change-Id: If68ea8fd7fb17b7ffb581f45d8406925578d96b8 Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 5 + 1 file changed, 5 insertions(+) diff

[PATCH 08/13] drm/amdgpu : Generate XGMI topology info from driver level

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Driver will save an array of XGMI hive info, each hive will have a list of devices that have the same hive ID. Change-Id: Ia2934d5b624cffa3283bc0a37679eddbd387cbdd Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/Makefile| 2 +-

[PATCH 07/13] drm/amdgpu: Add place holder functions for xgmi topology interface with psp

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Add dummy function for xgmi function interface with psp Change-Id: I01f35baf5a4b96e9654d448c9892be3cd72c05b7 Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 26 ++ 1 file changed, 26 insertions(+) d

[PATCH 06/13] drm/amdgpu : Add psp function interfaces for XGMI support

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Place holder for XGMI support Change-Id: I924fa3693366409de0218009c7f709cb464854cc Signed-off-by: Shaoyun Liu Reviewed-by: Huang Rui --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 34 + 1 file changed, 34 insertions(+) diff --git a/drivers/gpu

[PATCH 05/13] drm/amdgpu/gmc9: populate xgmi info for vega20

2018-09-05 Thread shaoyunl
From: Alex Deucher Call the new gfxhub 1.1 function to get the xgmi info. Acked-by: Huang Rui Acked-by: Slava Abramov Reviewed-by :Shaoyun liu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd

[PATCH 04/13] drm/amdgpu/gmc9: Adjust xgmi offset

2018-09-05 Thread shaoyunl
From: Alex Deucher On hives with xgmi enabled, the fb_location aperture is a size which defines the total framebuffer size of all nodes in the hive. Each GPU in the hive has the same view via the fb_location aperture. GPU0 starts at offset (0 * segment size), GPU1 starts at offset (1 * segment

[PATCH 03/13] drm/amdgpu/gmc9: add a new gfxhub 1.1 helper for xgmi

2018-09-05 Thread shaoyunl
From: Alex Deucher Used to populate the xgmi info on vega20. v2: PF_MAX_REGION is val - 1 (Ray) Acked-by: Huang Rui Acked-by: Slava Abramov Reviewed-by :Shaoyun liu Signed-off-by: Alex Deucher Change-Id: Ia7b7f112880e69cdbcf73a8abf04cd6ef303940c --- drivers/gpu/drm/amd/amdgpu/Makefile

[PATCH 02/13] drm/amdgpu/gmc: add initial xgmi structure to amdgpu_gmc structure

2018-09-05 Thread shaoyunl
From: Alex Deucher Initial pass at a structure to store xgmi info. xgmi is a high speed cross gpu interconnect. Acked-by: Huang Rui Acked-by: Slava Abramov Reviewed-by :Shaoyun liu Signed-off-by: Alex Deucher Change-Id: I8b373bd847c857dd7cbefa55d1ede2a8785deb06 --- drivers/gpu/drm/amd/amd

[PATCH 01/13] drm/amd/include: update the bitfield define for PF_MAX_REGION

2018-09-05 Thread shaoyunl
From: Shaoyun Liu Correct the definition based on vega20 register spec Change-Id: Ifde296134d00423cdf1078c8249d044f5b5cf5a5 Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_2_1_sh_mask.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deleti