Add support for dpc to the product
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
Add support for dpc to a series of products
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13
creation interruption occurs at this time,
bank reg info will be lost. (Thomas)
v5: each cycle is delayed by 5ms. (Tao)
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 74 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 +-
drivers/gpu/drm/amd/amdgpu
When poison is triggered multiple times, competition will occur.
Add a mutex lock to protect poison injection
Signed-off-by: Ce Sun
Reviewed-by: Yang Wang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++
2 files changed, 6 insertions
Add the amdgpu_aca_get_bank_count
Signed-off-by: Ce Sun
Signed-off-by: Xiang Liu
Reviewed-by: Yang Wang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 10 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
Correct the counts of nr_banks and nr_errors
Signed-off-by: Ce Sun
Reviewed-by: Yang Wang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index cbc40cad581b
By polling, poll ACA bank count to ensure that valid
ACA bank reg info can be obtained
v2: add corresponding delay before send msg to SMU to query mca bank info.
(Stanley)
v3: the loop cannot exit. (Thomas)
v4: remove amdgpu_aca_clear_bank_count. (Kevin)
Signed-off-by: Ce Sun
---
drivers/gpu
When poison is triggered multiple times, competition will occur.
Add a mutex lock to protect poison injection
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu
Add the amdgpu_aca_get_bank_count/amdgpu_aca_clear_bank_count interface
Signed-off-by: Ce Sun
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 10 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm
Correct the counts of nr_banks and nr_errors
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index cbc40cad581b..090bf6cf1b91 100644
--- a
By polling, poll ACA bank count to ensure that valid
ACA bank reg info can be obtained
v2: add corresponding delay before send msg to SMU to query mca bank info.
(Stanley)
v3: the loop cannot exit. (Thomas)
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 65
When poison is triggered multiple times, competition will occur.
Add a mutex lock to protect poison injection
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu
Add the amdgpu_aca_get_bank_count/amdgpu_aca_clear_bank_count interface
Signed-off-by: Ce Sun
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 14 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 3 +++
2 files changed, 17 insertions(+)
diff --git a/drivers/gpu
Correct the counts of nr_banks and nr_errors
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index cbc40cad581b..090bf6cf1b91 100644
--- a
By polling, poll ACA bank count to ensure that valid
ACA bank reg info can be obtained
v2: add corresponding delay before send msg to SMU to query mca bank info.
(Stanley)
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 44
Add the amdgpu_aca_get_bank_count/amdgpu_aca_clear_bank_count interface
Signed-off-by: Ce Sun
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 14 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 3 +++
2 files changed, 17 insertions(+)
diff --git a/drivers/gpu
Correct the counts of nr_banks and nr_errors
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index cbc40cad581b..090bf6cf1b91 100644
--- a
By polling, poll ACA bank count to ensure that valid
ACA bank reg info can be obtained
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 46 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 --
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 7
3 files
Add the amdgpu_aca_get_bank_count/amdgpu_aca_clear_bank_count interface
Signed-off-by: Ce Sun
Signed-off-by: Xiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 14 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 3 +++
2 files changed, 17 insertions(+)
diff --git a/drivers/gpu
Correct the counts of nr_banks and nr_errors
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index d1e431818212..d14dee8d6632 100644
--- a
held by the
dpc thread, but dpc thread has not released the reset domain lock.In the dpc
callback slot_reset,to obtain the hive lock, the hive lock is held by the
gpu recover thread at this time.So a deadlock occurred
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26
Move amdgpu_device_health_check into amdgpu_device_gpu_recover to
ensure that if the device is present can be checked before reset
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 +++---
1 file changed, 8 insertions(+), 17 deletions(-)
diff --git a
Try to ensure poison creation handle is completed in time
to set device rma value.
Signed-off-by: Ce Sun
Signed-off-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 11 insertions(+), 7
When the driver is unloaded, the interrupt source of
the rma device is not released, resulting in the failure
of hw_init when loading again using bad_page_threshold.
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff
cocci warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6088:8-9: Unneeded variable: "r".
>> Return "0" on line 6141
Reported-by: kernel test robot
Closes:
https://lore.kernel.org/oe-kbuild-all/202506281925.hhipxio7
cocci warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6088:8-9: Unneeded variable: "r".
>> Return "0" on line 6141
Reported-by: kernel test robot
Closes:
https://lore.kernel.org/oe-kbuild-all/202506281925.hhipxio7
From: Lijo Lazar
Make sure to release reset domain lock in case of failures.
Signed-off-by: Lijo Lazar
Signed-off-by: Ce Sun
Fixes: 0f936e23cf9d ("drm/amdgpu: refactor amdgpu_device_gpu_recover")
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 80 +++---
1 file c
rk+0x2f/0x40
[ 630.636413] ? __sched_group_set_shares+0x160/0x160
[ 630.647232] ret_from_fork_asm+0x11/0x20
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++--
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/
rk+0x2f/0x40
[ 630.636413] ? __sched_group_set_shares+0x160/0x160
[ 630.647232] ret_from_fork_asm+0x11/0x20
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +--
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/
rk+0x2f/0x40
[ 630.636413] ? __sched_group_set_shares+0x160/0x160
[ 630.647232] ret_from_fork_asm+0x11/0x20
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
1 file changed, 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/am
The number of newly added de counts and the number of
newly added error addresses remain consistent
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 1 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 8 ++--
3 files changed, 8
0011
[ 5139.304690] R10: 000a R11: 0246 R12: 55ce8b8f9a70
[ 5139.304691] R13: 55ce8b8f2ec0 R14: 55ce8b8f2ab0 R15: 55ce8b8f9aa0
[ 5139.304692]
[ 5139.304693] ---[ end trace 8536b052f7883003 ]---
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 +++
0011
[ 5139.304690] R10: 000a R11: 0246 R12: 55ce8b8f9a70
[ 5139.304691] R13: 55ce8b8f2ec0 R14: 55ce8b8f2ab0 R15: 55ce8b8f9aa0
[ 5139.304692]
[ 5139.304693] ---[ end trace 8536b052f7883003 ]---
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |
6 R12: 55ce8b8f9a70
[ 5139.304691] R13: 55ce8b8f2ec0 R14: 55ce8b8f2ab0 R15: 55ce8b8f9aa0
[ 5139.304692]
[ 5139.304693] ---[ end trace 8536b052f7883003 ]---
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 ++
dri
The number of newly added de counts and the number of
newly added error addresses remain consistent
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 1 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 11 +--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a
The number of newly added de counts and the number of
newly added error addresses remain consistent
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 1 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 11 +--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdg
Checking hive is more readable.
The following smatch warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6820 amdgpu_pci_slot_reset()
warn: iterator used outside loop: 'tmp_adev'
Fixes: 8ba904f54148 ("drm/amdgpu: Multi-GPU DPC recovery support")
Reported-by: Dan Carpenter
S
Checking hive is more readable.
The following smatch warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6820 amdgpu_pci_slot_reset()
warn: iterator used outside loop: 'tmp_adev'
Fixes: 8ba904f54148 ("drm/amdgpu: Multi-GPU DPC recovery support")
Reported-by: Dan Carpenter
S
Fixes smatch warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6820 amdgpu_pci_slot_reset()
warn: iterator used outside loop: 'tmp_adev'
Fixes: 8ba904f54148 ("drm/amdgpu: Multi-GPU DPC recovery support")
Signed-off-by: Ce Sun
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dev
Split amdgpu_device_gpu_recover into the following stages:
halt activities,asic reset,schedule resume and amdgpu resume.
The reason is that the subsequent addition of dpc recover
code will have a high similarity with gpu reset
Signed-off-by: Ce Sun
Reviewed-by: Hawking Zhang
---
drivers/gpu
err_event_athub and dpc recovery will corrupt VCPU buffer,
so we need to restore fw data and clear buffer in amdgpu_vcn_resume()
Signed-off-by: Ce Sun
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a
Add support for DPC recover based on refactored code
Signed-off-by: Ce Sun
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 172 ++---
drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +
3 files
Add link reset implementation
Signed-off-by: Ce Sun
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 28 +++
drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 2 ++
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 26 +
drivers/gpu
:38:00.0: amdgpu: RW: 0x0
Solved by patch-4
Ce Sun (4):
drm/amd/pm: Add link reset for SMU 13.0.6
drm/amdgpu: refactor amdgpu_device_gpu_recover
drm/amdgpu: Multi-GPU DPC recovery support
drm/amdgpu/vcn: during dpc recovery will corrupt VCPU buffer
drivers/gpu/drm/amd/amdgpu
44 matches
Mail list logo