[PATCH Review 1/1] drm/amdgpu: Fix shared buff copy to user

2024-02-05 Thread Stanley . Yang
ta if invoke node buffer | ta type --| | ta id --| | cmd id --| |-- shared buf len -| |-- shared buffer --| ta if invoke node buffer is as above, copy shared buffer data to correct location Signed-off-by: Stanley.Yang --- drive

[PATCH Review 1/1] drm/amdgpu: Fix ineffective ras_mask settings

2024-02-21 Thread Stanley . Yang
Check amdgpu_ras_mask to fix ineffective ras_mask setting due to special asic without sram ecc enable but with poison supported. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/

[PATCH Review 1/1] drm/amdgpu: Support setting recover method

2024-04-11 Thread Stanley . Yang
Don't modify amdgpu gpu recover get operation, add amdgpu gpu recover set operation to select reset method, only support mode1 and mode2 currently. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/d

[PATCH Review 1/1] drm/amdgpu: Support setting reset_method at runtime

2024-04-11 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 80b9642f2bc4..5f5bf0c26b1f 100644 --- a/drivers/gpu/drm/amd/amdgpu/a

[PATCH Review 1/1] drm/amdkfd: Use mode1 reset for GFX v9.4.4

2024-07-07 Thread Stanley . Yang
GFX v9.4.4 uses mode1 reset to handle poison consumption. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_in

[PATCH Review 1/1] drm/amdgpu: Fix eeprom max record count

2024-07-17 Thread Stanley . Yang
The eeprom table is empty before initializing, add get eeprom table version function according UMC HWIP version before initializing eeprom table. Signed-off-by: Stanley.Yang --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-

[PATCH Review V2 1/1] drm/amdgpu: Fix eeprom max record count

2024-07-17 Thread Stanley . Yang
The eeprom table is empty before initializing, set eeprom table version first before initializing. Changed from V1: Reuse amdgpu_ras_set_eeprom_table_version function Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 3 +++ 1 file changed, 3 insertions(+)

[PATCH Review 1/1] drm/amdgpu: Adjust XGMI WAFL ras enable bit

2024-04-25 Thread Stanley . Yang
The way to get ras capability has changed for some asics, both of them need check XGMI physical nodes number to set XGMI WAFL ras enable bit. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a

[PATCH Review 1/1] drm/amdgpu: Fix ecc irq enable/disable unpaired

2023-12-15 Thread Stanley . Yang
The ecc_irq is disabled while GPU mode2 reset suspending process, but not be enabled during GPU mode2 reset resume process. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/aldebaran.c | 6 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 + drivers/gpu/drm/a

[PATCH Review V2 1/1] drm/amdgpu: Fix ecc irq enable/disable unpaired

2023-12-19 Thread Stanley . Yang
The ecc_irq is disabled while GPU mode2 reset suspending process, but not be enabled during GPU mode2 reset resume process. Changed from V1: only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip, delete amdgpu_ras_late_resume function. Signed-off-by: Stanley.Yang --- driv

[PATCH Review V3 1/1] drm/amdgpu: Fix ecc irq enable/disable unpaired

2023-12-20 Thread Stanley . Yang
The ecc_irq is disabled while GPU mode2 reset suspending process, but not be enabled during GPU mode2 reset resume process. Changed from V1: only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip delete amdgpu_ras_late_resume function Changed from V2: check umc ras s

[PATCH Review 1/1] drm/amdgpu: Fix ineffective ras_mask settings

2023-12-21 Thread Stanley . Yang
For the special asic with mem ecc enabled but sram ecc not enabled, even if the ras block is not supported on .ras_enabled, if the asic supports poison mode and the ras block has ras configuration, it can be considered that the ras block supports ras function only with sram ecc is not enabled, othe

[PATCH Review 1/1] drm/amdgpu: Show deferred error count for UMC

2024-01-17 Thread Stanley . Yang
Show deferred error count for UMC syfs node Signed-off-by: Stanley.Yang Reviewed-by: Tao Zhou Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gp

[PATCH Review 1/1] drm/amdgpu: Skip do PCI error slot reset during RAS recovery

2024-01-17 Thread Stanley . Yang
Why: The PCI error slot reset maybe triggered after inject ue to UMC multi times, this caused system hang. [ 557.371857] amdgpu :af:00.0: amdgpu: GPU reset succeeded, trying to resume [ 557.373718] [drm] PCIE GART of 512M enabled. [ 557.373722] [drm] PTB located at 0x00

[PATCH Review 1/1] drm/amdgpu: Fix ras features value calltrace

2024-01-17 Thread Stanley . Yang
The high three bits of ras features mask indicate socket id, it should skip to check high three bits of ras features mask before disable all ras features. Signed-off-by: Stanley.Yang Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 ++- drivers/gpu/drm/amd/amd

[PATCH Review 1/1] drm/ttm: fix debugfs node create failed

2021-10-12 Thread Stanley . Yang
Test scenario: modprobe amdgpu -> rmmod amdgpu -> modprobe amdgpu Error log: [ 54.396807] debugfs: File 'page_pool' in directory 'amdttm' already present! [ 54.396833] debugfs: File 'page_pool_shrink' in directory 'amdttm' already present! [ 54.396848] debugfs: File 'buffer_

[PATCH Review 1/1] drm/amdgpu: fix smu not match warning

2021-11-15 Thread Stanley . Yang
update smu driver if version to avoid mismatch log Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/inc/smu_v13_0.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h b/drivers/gpu

[PATCH Review 1/1] drm/amdgpu: fix smu not match warning

2021-11-16 Thread Stanley . Yang
update smu driver if version to avoid mismatch log Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/inc/smu_v13_0.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h b/drivers/gpu

[PATCH Review 1/1] drm/amdgpu: fix smu not match warning

2021-11-16 Thread Stanley . Yang
update smu driver if and version to avoid mismatch log v2: update smu driver interface Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935 Signed-off-by: Stanley.Yang --- .../drm/amd/pm/inc/smu13_driver_if_aldebaran.h | 18 +- drivers/gpu/drm/amd/pm/inc/smu_v13_0.h

[PATCH Review 1/4] drm/amdgpu: Update smu driver interface for aldebaran

2021-11-17 Thread Stanley . Yang
update smu driver if version to 0x08 to avoid mismatch log A version mismatch can still happen with an older FW Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935 Signed-off-by: Stanley.Yang --- .../drm/amd/pm/inc/smu13_driver_if_aldebaran.h | 18 +- drivers/gpu/drm/amd/pm/inc/

[PATCH Review 2/4] drm/amdgpu: add new query interface for umc block

2021-11-17 Thread Stanley . Yang
add message smu to query error information Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 + drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 161 3 files changed, 181 insertions(+) diff --git a/

[PATCH Review 3/4] drm/amdgpu: add message smu to get ecc_table

2021-11-17 Thread Stanley . Yang
support ECC TABLE message, this table include unc ras error count and error address Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 7 .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 38 +++ .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 2

[PATCH Review 4/4] query umc error info from ecc_table

2021-11-17 Thread Stanley . Yang
if smu support ECCTABLE, driver can message smu to get ecc_table then query umc error info from ECCTABLE apply pmfw version check to ensure backward compatibility Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 42 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras

[PATCH Review 1/4] drm/amdgpu: Update smu driver interface for aldebaran

2021-11-18 Thread Stanley . Yang
update smu driver if version to 0x08 to avoid mismatch log A version mismatch can still happen with an older FW Change-Id: I97f2bc4ed9a9cba313b744e2ff6812c90b244935 Signed-off-by: Stanley.Yang --- .../drm/amd/pm/inc/smu13_driver_if_aldebaran.h | 18 +- drivers/gpu/drm/amd/pm/inc/

[PATCH Review 2/4] drm/amdgpu: add new query interface for umc block v2

2021-11-18 Thread Stanley . Yang
add message smu to query error information v2: rename message_smu to ecc_info Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 + drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 161 3 files c

[PATCH Review 3/4] drm/amdgpu: add message smu to get ecc_table v2

2021-11-18 Thread Stanley . Yang
support ECC TABLE message, this table include umc ras error count and error address v2: add smu version check to query whether support ecctable call smu_cmn_update_table to get ecctable directly Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 8 +++ driv

[PATCH Review 4/4] query umc error info from ecc_table v2

2021-11-18 Thread Stanley . Yang
if smu support ECCTABLE, driver can message smu to get ecc_table then query umc error info from ECCTABLE v2: optimize source code makes logical more reasonable Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 42 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_umc

[PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier

2021-11-26 Thread Stanley . Yang
Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, so ras ta will unload before send ras disable command, ras dsiable operation must before hw fini. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

[PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier v2

2021-11-26 Thread Stanley . Yang
v2: still need call ras_disable_all_featrures to handle ras initilization failure case. Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, so ras ta will unload before send ras disable command, ras dsiable operation must before hw fini. Signed-off-by: Stanley.Yang ---

[PATCH Review 1/1] drm/amdgpu: adjust ip block suspend sequence on aldebaran to fix disable smu feature failure

2021-11-28 Thread Stanley . Yang
{ [ 578.019986] amdgpu :23:00.0: amdgpu: GPU reset begin! [ 583.245566] amdgpu :23:00.0: amdgpu: Failed to disable smu features. [ 583.245621] amdgpu :23:00.0: amdgpu: Fail to disable dpm features! [ 583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR*

[PATCH Review 1/1] drm/amdgpu: adjust ip block add sequence on aldebaran

2021-11-28 Thread Stanley . Yang
Reason: { [ 578.019986] amdgpu :23:00.0: amdgpu: GPU reset begin! [ 583.245566] amdgpu :23:00.0: amdgpu: Failed to disable smu features. [ 583.245621] amdgpu :23:00.0: amdgpu: Fail to disable dpm features! [ 583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]

[PATCH Review 1/2] drm/amdgpu: skip query ecc info in gpu recovery

2021-12-02 Thread Stanley . Yang
this is a workaround due to get ecc info failed during gpu recovery [ 700.236122] amdgpu :09:00.0: amdgpu: Failed to export SMU ecc table! [ 700.236128] amdgpu :09:00.0: amdgpu: GPU reset begin! [ 704.331171] amdgpu: qcm fence wait loop timeout expired [ 704.331194] amdgpu: The cp migh

[PATCH Review 1/1] drm/amdgpu: only skip get ecc info for aldebaran

2021-12-02 Thread Stanley . Yang
skip get ecc info for aldebarn through check ip version do not affect other asic type Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgp

[PATCH Review 1/1] drm/amd/pm: print errorno if get ecc info failed

2021-12-06 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c index 6e781cee8bb6..e0a8224e466f 100644 -

[PATCH Review 1/1] drm/amdgpu: skip umc ras error count harvest

2021-12-06 Thread Stanley . Yang
remove in recovery stat check, skip umc ras err cnt harvest in amdgpu_ras_log_on_err_counter Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/dri

[PATCH Review 1/1] drm/amdgpu: support sdma error injection

2021-04-01 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 0e16683876aa..d9d292c79cfa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++

[PATCH Review 1/1] drm/amdgpu: optimize gfx ras features flag clean

2021-04-19 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index ec3ebc33ee03..8fdf355d7de8 100644 --- a/drivers/gpu/drm/a

[PATCH Review 1/1] drm/amdgpu: force enable gfx ras for vega20 ws

2021-04-29 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index daf63a4c1fff..dfeaa57dd7ea 100644 --- a/drivers/gpu/drm/amd/

[PATCH Review 1/1] drm/amdgpu: handle denied inject error into critical regions

2022-01-11 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 3 ++- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/dr

[PATCH Review 1/1] drm/amdgpu: handle denied inject error into critical regions v2

2022-01-12 Thread Stanley . Yang
Changed from v1: remove unused brace Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 9 - drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers

[PATCH Review 1/1] drm/amdgpu: remove unused variable warning

2022-01-19 Thread Stanley . Yang
Change-Id: Ic2a488ee253a913d806bd33ee9c90e31a71af320 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 23 --- drivers/gpu/drm/amd/amdgpu/umc_v8_7.c | 6 -- 2 files changed, 29 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c b/drive

[PATCH Review 1/1] drm/amdgpu: fix convert bad page retiremt

2022-01-19 Thread Stanley . Yang
Pmfw read ecc info registers and store values in eccinfo_table in the following order umc0 ch_inst 0, 1, 2 ... 7 umc1 ch_inst 0, 1, 2 ... 7 ... umc3 ch_inst 0, 1, 2 ... 7 Driver should convert eccinfo_table_idx into channel_index according to channel_idx_tbe. Change-Id: Icafe93e458912b729d2e30d6

[PATCH Review 1/1] drm/amdgpu: fix channel index mapping for SIENNA_CICHLID

2022-01-21 Thread Stanley . Yang
Pmfw read ecc info registers in the following order, umc0: ch_inst 0, 1, 2 ... 7 umc1: ch_inst 0, 1, 2 ... 7 The position of the register value stored in eccinfo table is calculated according to the below formula, channel_index = umc_inst * channel_in_umc + ch_inst Driver directly us

[PATCH Review 1/1] drm/amdgpu: Reset OOB table error count info

2022-02-10 Thread Stanley . Yang
The OOB table error count info should be reset after reset eeprom table Change-Id: I2a39e0e44b7b1a5ab7d6b4d4b73ebe48264396b7 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras

[PATCH Review 1/1] drm/amdgpu: adjust register address calculation

2022-02-11 Thread Stanley . Yang
the UMC_STATUS register is not liner, adjust offset calculation formula to get correct address Change-Id: Ic8926078301848330babf289c4238dc8cbcf313d Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd

[PATCH Review 1/1] drm/amdgpu: print more error info

2022-02-14 Thread Stanley . Yang
print more error info when deferred uncorrectable ras error changed from V1: move Defferred error msg into query uncorrectable error count function. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 72

[PATCH Review 1/1] drm/amdgpu: fix bad address translation for sienna_cichlid

2021-06-16 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 5 + drivers/gpu/drm/amd/amdgpu/umc_v8_7.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h index bbcccf53080d..e

[PATCH Review 1/1] drm/amdgpu: force enable vega20 gaming sku gfx ras

2021-06-16 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index f404c2321a6a..ca5a32944242 100644 --- a/drivers/gpu/drm/amd/amdgp

[PATCH Review 1/1] drm/amdgpu: initialize umc ras function

2021-07-08 Thread Stanley . Yang
From: John Clements support umc ras function initialization for aldebaran Change-Id: I84155d4d3eaae86a8c1bd2331b1964946c47f6da Signed-off-by: John Clements Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 13 + drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 15

[PATCH V4 1/1] drm/amdgpu: update athub interrupt harvesting handle

2020-09-21 Thread Stanley . Yang
GCEA/MMHUB EA error should not result to DF freeze, this is fixed in next generation, but for some reasons the GCEA/MMHUB EA error will result to DF freeze in previous generation, diver should avoid to indicate GCEA/MMHUB EA error as hw fatal error in kernel message by read GCEA/MMHUB err status re

[PATCH 1/1] drm/amdgpu: fix hdp register access error

2020-09-22 Thread Stanley . Yang
mmHDP_READ_CACHE_INVALIDATE register is in HDP not in NBIO Signed-off-by: Stanley.Yang Change-Id: I4375a8a67d3a13f9605479e169169e22dd5833d1 --- drivers/gpu/drm/amd/amdgpu/nv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/am

[PATCH Review 1/1] drm/amdgpu: fix send ras disable cmd when asic not support ras

2021-03-12 Thread Stanley . Yang
cause: It is necessary to send ras disable command to ras-ta to program GB_EDC_MODE to "BYPASS" mode during gfx block ras later init, because the ras capability is disable read from vbios for vega20 gaming, but the ras context is released during ras init process,

[PATCH Review v3 1/1] drm/amdgpu: fix send ras disable cmd when asic not support ras

2021-03-14 Thread Stanley . Yang
cause: It is necessary to send ras disable command to ras-ta during gfx block ras later init, because the ras capability is disable read from vbios for vega20 gaming, but the ras context is released during ras init process, this will cause send ras disable comman

[PATCH] drm/amdgpu: support reserve bad page for virt

2020-06-03 Thread Stanley . Yang
Signed-off-by: Stanley.Yang Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 164 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 30 +++- 3 files changed, 196 insertions(

[PATCH V2] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Stanley . Yang
Changed from V1: rename same functions name, only init ras error handler data for supported asic. Signed-off-by: Stanley.Yang Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |

[PATCH V3] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Stanley . Yang
Changed from V1: rename some functions name, only init ras error handler data for supported asic. Changed from V2: fix poential memory leak. Signed-off-by: Stanley.Yang Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |

[PATCH V3] drm/amdgpu: support reserve bad page for virt

2020-06-04 Thread Stanley . Yang
Changed from V1: rename some functions name, only init ras error handler data for supported asic. Changed from V2: fix potential memory leak. Signed-off-by: Stanley.Yang Change-Id: Ia0ad9453ac3ac929f95c73cbee5b7a8fc42a9816 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |

[PATCH 1/1] drm/amdgpu: set default value of noretry to 1 for specified asic

2020-11-23 Thread Stanley . Yang
noretry = 0 casue KFDGraphicsInterop test failed on SRIOV platform for vega10, so set noretry to 1 for vega10. Signed-off-by: Stanley.Yang Change-Id: I241da5c20970ea889909997ff044d6e61642da81 --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/

[PATCH 1/1] drm/amdgpu: only skip smc sdma sos ta and asd fw in SRIOV for navi12

2020-11-23 Thread Stanley . Yang
The KFDTopologyTest.BasicTest will failed if skip smc, sdma, sos, ta and asd fw in SRIOV for vega10, so adjust above fw and skip load them in SRIOV only for navi12. Signed-off-by: Stanley.Yang Change-Id: Id354be93723d7b5d769d73dc67c596af300305af --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c

[PATCH V2 1/1] drm/amdgpu: only skip smc sdma sos ta and asd fw in SRIOV for navi12

2020-11-24 Thread Stanley . Yang
The KFDTopologyTest.BasicTest will failed if skip smc, sdma, sos, ta and asd fw in SRIOV for vega10, so adjust above fw and skip load them in SRIOV only for navi12. v2: remove unnecessary asic type check. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 -

[PATCH 1/1] drm/amdgpu: fix sdma instance fw version and feature version init

2020-12-06 Thread Stanley . Yang
each sdma instance fw_version and feature_version should be set right value when asic type isn't between SIENNA_CICHILD and CHIP_DIMGREY_CAVEFISH Signed-off-by: Stanley.Yang Change-Id: I1edbf3e0557d771eb4c0b686fa5299a3b5f26e35 --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 2 +- 1 file changed, 1

[PATCH 1/1] drm/amdgpu: skip load smu and sdma microcode on sriov for SIENNA_CICHLID

2020-12-13 Thread Stanley . Yang
skip load smu and sdma fw on sriov due to smc, sos, ta and asd fw have been skipped for SIENNA_CICHLID. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c| 3 +++ drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 +++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --g

[PATCH V2 1/1] drm/amdgpu: skip load smu and sdma microcode on sriov for SIENNA_CICHLID

2020-12-14 Thread Stanley . Yang
skip load smu and sdma fw on sriov due to sos, ta and asd fw have been skipped for SIENNA_CICHLID. V2: move asic check into smu11 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 3 +++ drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 10 -- drivers/

[PATCH Review 1/1] drm/amdgpu: Fix false positive error log

2023-09-15 Thread Stanley . Yang
It should first check block ras obj whether be set, it should return directly if block ras obj is not set. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/dr

[PATCH Review V2 1/1] drm/amdgpu: Fix false positive error log

2023-09-15 Thread Stanley . Yang
It should first check block ras obj whether be set, it should return 0 directly if block ras obj or hw_ops is not set. Changed from V1: return 0 directly if block ras obj or hw ops is not set Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +- 1 file

[PATCH Review 1/1] drm/amdgpu: Skip ring test during ras in recovery

2023-09-27 Thread Stanley . Yang
This is workaround due to ring test failed during ras do gpu recovery for aqua vanjaram. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_

[PATCH Review 1/1] drm/amdgpu: Fix potential null pointer derefernce

2023-09-27 Thread Stanley . Yang
The amdgpu_ras_get_context may return NULL if device not support ras feature, so add check before using. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/

[PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery

2023-10-17 Thread Stanley . Yang
This is workaround, kiq ring test failed in suspend stage when do ras recovery for gfx v9_4_3. Change-Id: I8de9900aa76706f59bc029d4e9e8438c6e1db8e0 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 21 + 1 file changed, 21 insertions(+) diff --git a/d

[PATCH Review 1/1] drm/amdgpu: Enable mca debug mode mode for apu

2023-10-18 Thread Stanley . Yang
Enable smu_v13_0_6 mca debug mode when GFX RAS feature is enabled on APU. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu

[PATCH Review V2 1/1] drm/amdgpu: Enable mca debug mode mode when ras enabled

2023-10-18 Thread Stanley . Yang
Enable smu_v13_0_6 mca debug mode if ras is enabled. Changed from V1: enable mca debug mode if ras enabled. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/sws

[PATCH Review 1/1] drm/amdgpu: Fix delete nodes that have been relesed

2023-10-19 Thread Stanley . Yang
Fix delete nodes that it has been freed. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 8831859a2c49..867afbf84

[PATCH Review 1/1] drm/amdgpu: Enable RAS feature by default for APU

2023-10-19 Thread Stanley . Yang
Enable RAS feature by default for aqua vanjaram on apu platform. Change-Id: I02105d07d169d1356251c994249a134ca5dd2a7a Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/am

[PATCH Review 1/1] drm/amdgpu: Reset vram error data info

2023-11-01 Thread Stanley . Yang
Reset error data info stored in vram when user clear eeprom table. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 97 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 + .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 4 + 3 files changed,

[PATCH Review 1/1] drm/amdgpu: support send bad channel info to smu

2022-03-01 Thread Stanley . Yang
Message SMU bad channel information bitmap to update OOB table Change-Id: I49a79af64d5263c28db059ecb8b8405a471431b4 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 ++ .../gpu/drm/amd/amdgpu/amdgpu_ras_eep

[PATCH Review 1/2] drm/amd/pm: add send bad channel info function

2022-03-03 Thread Stanley . Yang
support message SMU update bad channel info to update HBM bad channel info in OOB table Change-Id: I1e50ed8118f4c1aaefb04c040e59ae4918cdc295 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 12 ++ drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 1 + drivers/gp

[PATCH Review 2/2] drm/amdgpu: message smu to update bad channel info

2022-03-03 Thread Stanley . Yang
It should notice SMU to update bad channel info when detected uncorrectable error in UMC block Change-Id: I2dc8848affdb53e52891013953ae9383fff5f20f Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++ .

[PATCH Review 1/1] drm/amd/pm: use pm mutex to protect ecc info table

2022-03-10 Thread Stanley . Yang
Change-Id: I6afe0332cbb20528648c38665264930d6b091c2f Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c index 9a892d6d1d7a..89fbee5

[PATCH Review 1/1] drm/amdgpu/pm: add asic smu support check

2022-03-20 Thread Stanley . Yang
It must check asic whether support smu before call smu powerplay function, otherwise it may cause null point on no support smu asic. Change-Id: Ib86f3d4c88317b23eb1040b9ce1c5c8dcae42488 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 6 ++ 1 file changed, 6 insertions(+

[PATCH Review 1/1] drm/amdgpu: print more correctable error info

2022-04-07 Thread Stanley . Yang
Change-Id: I09a2aae85cde3ab2cb6b042b973da6839ad024ec Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 62 ++- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.

[PATCH Review 1/1] drm/amdgpu: add umc query error status function

2022-04-08 Thread Stanley . Yang
In order to debug ras error, driver will print IPID/SYND/MISC0 register value if detect correctable or uncorrectable error. Provide umc_query_error_status_helper function to reduce code redundancy. Change-Id: I09a2aae85cde3ab2cb6b042b973da6839ad024ec Signed-off-by: Stanley.Yang --- drivers/gpu/d

[PATCH Review 1/1] drm/amdgpu: print ras drv fw debug info

2023-03-23 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c index 6d2879ac585b..f76b1cb8baf8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgp

[PATCH Review 2/2] drm/amdgpu: correct ras enabled flag

2023-04-10 Thread Stanley . Yang
XGMI RAS should be according to the gmc xmgi supported flag and xgmi physical nodes number. Change-Id: Idf3600b30584b10b528e7237d103d84d5097b7e0 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd

[PATCH Review 1/2] drm/admgpu: fix unexpected block id

2023-04-10 Thread Stanley . Yang
Change-Id: Icceb43556eec802f11c2077c1c58a1e92c9df599 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdg

[PATCH Review V2 1/2] drm/amdgpu: fix unexpected block id

2023-04-11 Thread Stanley . Yang
Aldebaran supports VCN and JPEG RAS, it reports unexpected block id message during VCN and JPEG RAS initialization if VCN and JPEG block id not defined. Change-Id: Icceb43556eec802f11c2077c1c58a1e92c9df599 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 drivers/

[PATCH Review V2 2/2] drm/amdgpu: correct ras enabled flag

2023-04-11 Thread Stanley . Yang
XGMI RAS should be according to the gmc xgmi physical nodes number, XGMI RAS should not be enabled if xgmi num_physical_nodes is zero. Change-Id: Idf3600b30584b10b528e7237d103d84d5097b7e0 Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +++ 1 file changed, 7 inser

[PATCH Review 1/1] drm/amdgpu: Add SDMA_UTCL1_WR_FIFO_SED field for sdma_v4_4_ras_field

2023-04-27 Thread Stanley . Yang
Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c index 6f9895cdddb1..0ddb6955a6d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4.c +++

[PATCH Review V2 2/3] drm/amdgpu: support check vcn jpeg block mask

2023-06-05 Thread Stanley . Yang
Support VCN/JPEG instance mask checking, pass logical mask directly except GFX/SDMA/VCN/JPEG blocks. Changed from V1: correct a typo Signed-off-by: Stanley.Yang Reviewed-by: Tao Zhou Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +- 1 file changed, 5 i

[PATCH Review V2 1/3] drm/amdgpu: pass xcc mask to ras ta

2023-06-05 Thread Stanley . Yang
pass xcc mask to ras ta, ras ta will compare the mask with the one from chiplet topology. Changed from V1: Remove IP version checking. Set ras_cmd->ras_init_message.init_flags.xcc_mask directly due to xcc_mask is common structres to all the devices. Signed-off-by:

[PATCH Review V2 3/3] drm/amdgpu: convert vcn/jpeg logical mask to physical mask

2023-06-05 Thread Stanley . Yang
Changed from V1: Remove amdgpu_ras_logical_mask_to_physical_mask due to GET_MASK provides same feature. Support convert VCN/JPEG logical mask to physical mask. Signed-off-by: Stanley.Yang Reviewed-by: Tao Zhou Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/

[PATCH Review 1/6] drm/amdgpu: Rename ras table version

2023-06-06 Thread Stanley . Yang
Rename RAS_TABLE_VER to RAS_TABLE_VER_V1, move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 2 ++ 2 files changed, 4 insertions(+), 3 de

[PATCH Review 3/6] drm/amdgpu: Support setting EEPROM table version

2023-06-06 Thread Stanley . Yang
Add setting EEPROM table version interface for umcv8.10, Add EEPROM table v2.1 to UMC v8.10. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 ++ drivers/gpu/drm/amd/amdgpu/umc_v8_10.c | 6 ++ 2 files changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdg

[PATCH Review 4/6] drm/amdgpu: Add support EEPROM table v2.1

2023-06-06 Thread Stanley . Yang
Add ras info to EEPROM table, app can analyse device ECC status without GPU driver through EEPROM table ras info. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 204 -- .../gpu/drm/amd/amdgpu

[PATCH Review 6/6] drm/amdgpu: Set EEPROM ras info

2023-06-06 Thread Stanley . Yang
Set EEPROM ras info: rma status, health percent and bad page threshold. Signed-off-by: Stanley.Yang --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 24 +++ .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h| 5 2 files changed, 29 insertions(+) diff --git a/drivers/gpu/drm

[PATCH Review 5/6] drm/amdgpu: Calculate EEPROM table ras info bytes sum

2023-06-06 Thread Stanley . Yang
It's more reasonable to check EEPROM table ras info bytes. Signed-off-by: Stanley.Yang --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ra

[PATCH Review 2/6] drm/amdgpu: Add RAS table v2.1 macro definition

2023-06-06 Thread Stanley . Yang
Add RAS EEPROM table version 2.1 macro definition. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 1 + 2 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eepro

[PATCH Review 1/2] drm/amdgpu: Optimze checking ras supported

2023-06-12 Thread Stanley . Yang
Using "is_app_apu" to identify device in the native APU mode or carveout mode. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 34 ++--- 3 files cha

[PATCH Review 2/2] drm/amdgpu: Add checking mc_vram_size

2023-06-12 Thread Stanley . Yang
Do not compare injection address with mc_vram_size if mc_vram_size is zero. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_r

[PATCH Review 1/1] drm/amdgpu: Remove redundant poison consumption handler function

2023-06-19 Thread Stanley . Yang
The function callback handle_poison_consumption and callback function poison_consumption_handler are almost same to handle poison consumption, remove poison_consumption_handler. Signed-off-by: Stanley.Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 - drivers/gpu/drm/amd/amdgpu/am

[PATCH Review V2 1/1] drm/amdgpu: Remove redundant poison consumption handler function

2023-06-19 Thread Stanley . Yang
The function callback handle_poison_consumption and callback function poison_consumption_handler are almost same to handle poison consumption, remove poison_consumption_handler. Changed from V1: Add handle poison consumption function for VCN2.6, VCN4.0, JPEG2.6 and JPEG4.0, return

[PATCH Reivew 1/1] drm/amdgpu: fix use-after-free during gpu recovery

2022-11-16 Thread Stanley . Yang
[Why] [ 754.862560] refcount_t: underflow; use-after-free. [ 754.862898] Call Trace: [ 754.862903] [ 754.862913] amdgpu_job_free_cb+0xc2/0xe1 [amdgpu] [ 754.863543] drm_sched_main.cold+0x34/0x39 [amd_sched] [How] The fw_fence may be not init, check whether dma_fenc

  1   2   >