Add critical address check for bad page retirement.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 0ad3a9eedfd2
Support ras critical address check.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 89 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 14
2 files changed, 103 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu
Add command to check address validity and remove
unused command codes.
v2:
The command interface adds new parameters to support
multiple check address strategies.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 63 +
drivers/gpu/drm/amd/amdgpu
Add command to check address validity and remove
unused command codes.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 58 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 ++
2 files changed, 29 insertions(+), 32 deletions(-)
diff --git a/drivers
: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 38
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 17 +
2 files changed, 55 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index
The timeout is only used to interrupt polling and
not need to print a error message.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 +--
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm
On nbio v7.4, ras controller interrupt and athub
interrupt are generated after injecting UE to PCIE,
but gpu reset only needs to be triggered once.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers
In multiple GPUs case, after a GPU has started
resetting all GPUs on hive, other GPUs do not
need to trigger GPU reset again.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 -
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers
The ras command shared memory is allocated from
VRAM and the response status of the command
buffer will not be zero due to gpu being in
fatal error state after ras UE error injection.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 +---
1 file changed, 1 insertion
Split into 3 parts:
1. Convert soc physical address via ras ta.
2. Expand bad pages from soc physical address.
3. Dump bad address info.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 116 -
1 file changed, 77 insertions(+), 39 deletions(-)
diff
Remove unused code.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 29 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 10 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 86 -
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 5 --
4 files changed
1. Use pa_pfn as the radix-tree key index to log
deferred error info.
2. Use local array to store expanded bad pages.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 14 ++
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
Add mutex to protect ras shared memory.
v2:
Add TA_RAS_COMMAND__TRIGGER_ERROR command call
status check.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 123 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 1 +
drivers/gpu/drm/amd/amdgpu
When a gpu in hive is performing ras reset, other
gpus in hive do not need to schedule recovery work
to reset the gpu.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 +++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.
v2:
Put the same code into a function and reuse the function.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 35 -
1 file changed, 29 insertions(+), 6 deletions
.
v2:
1. Add the above description to code comments.
2. Reuse existing function.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 18 ++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --
plete.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 14 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 6 ++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 --
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 3 +++
3 files changed, 5 insertions(+), 3 deletions(-)
diff
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Add mutex to protect ras shared memory.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 124 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +
3 files changed, 87 insertions(+), 40 deletions
Add gpu reset check and exception handling for
page retirement.
v2:
Clear poison consumption messages cached in fifo after
non mode-1 reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 52 +
1 file changed, 52 insertions(+)
diff --git a
: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
1. The poison fifo is only used for poison consumption
requests.
2. Merge reset requests when poison fifo caches multiple
poison consumption messages
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 56 -
drivers/gpu/drm/amd/amdgpu
Add variable to record the deferred error
number read by driver.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 62 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 4 +-
3 files changed, 48
Add completion to wait for ras reset to complete.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm
Add gpu reset check and exception handling for
page retirement.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 +
1 file changed, 43 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu
1. The poison fifo is only used for poison consumption
requests.
2. Merge reset requests when poison fifo caches multiple
poison consumption messages
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 58 +
drivers/gpu/drm/amd/amdgpu
: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 41 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Add variable to record the deferred error
number read by driver.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 62 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 4 +-
3 files changed, 48
If gpu is recovering, clear all message reset flags
in fifo and wait for gpu to complete recovery.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12
1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm
If the number of messages to be processed in the fifo exceeds
the threshold, it will not continue to wait for the DE data
to be ready.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 +++-
2 files changed
Add completion to wait for gpu to complete reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm
To avoid resetting the gpu repeatedly, clear all
message reset flags in the fifo before the first
gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 59 -
1 file changed, 58 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd
1. Cannot add messages to fifo in gpu reset mode.
2. Only when the message is successfully saved to the
fifo, the thread can be awakened.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 18
Change log level.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Add mutex to protect ras shared memory.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 121 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +
3 files changed, 84 insertions(+), 40 deletions
Remove redundant function call.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22 ++
1 file changed, 6 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Remove unused code.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 69 --
1 file changed, 69 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 8df84feaf046..12bae67be91c 100644
--- a
Fix ras mode2 reset failure in ras aca mode.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index edb3cd0cef96..11a70991152c 100644
Fix ras mode2 reset failure in ras aca mode for
sdma v4_4_2 and gfx v9_4_3.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 4
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4
2 files changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
Use new interface to reserve bad page.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d1a2ab944b7d
retired_page is page frame and should be expanded
to the full address when querying status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
support ACA logging ecc errors.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index bd917eb6ea24..8df84feaf046 100644
--- a/drivers
Add poison consumption handler.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 ++---
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Prepare to handle pasid poison consumption.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 9 -
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 20 ---
drivers/gpu/drm/amd/amdgpu
Add condition check for amdgpu_umc_fill_error_record.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 20 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
3 files changed, 19 insertions(+), 4 deletions
1. umc v12_0 logs ecc errors.
2. Reserve newly detected ecc error pages.
3. Add tag for bad pages, so that they can
be retired later.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 67 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 7
Retire bad pages for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 57 +-
1 file changed, 55 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index
Add delay work to retire bad pages.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 36 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 3 +++
4 files
Umc v12_0 converts error address.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 94 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 12
2 files changed, 105 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b
Add interface to update umc v12_0 ecc status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 9 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 6 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
Add poison creation handler.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 74 +++--
1 file changed, 69 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Add interface to reserve bad page.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4
2 files changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
Prepare for logging ecc errors.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 33 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 23 +
2 files changed, 56 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b
Add message fifo to handle RAS poison events.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 32 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 18 ++
2 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
add new nodes for the addresses that are not in the
reserved_pages list and reservations_pending list.
V2:
Avoid repeated locking/unlocking.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 25 +---
1 file changed, 16 insertions(+), 9 deletions
add new nodes for the addresses that are not in the
reserved_pages list and reservations_pending list.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28 +---
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
Need to resume ras during gpu reset for
gfx v9_4_3 sriov
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index afc0b4eb7f8e
/0x80
[ 484.496866] ? exc_page_fault+0x87/0x170
[ 484.496868] ? asm_exc_page_fault+0x8/0x30
[ 484.496871] entry_SYSCALL_64_after_hwframe+0x44/0xae
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a
Use asynchronous polling to handle umc_v12_0 poisoning.
v2:
1. Change function name.
2. Change the debugging information content.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 139 ++--
drivers
Add interface to check mca umc status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 12 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 +++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c| 20
Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 8 ++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 66
Preparing for asynchronous processing of umc page retirement.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 34 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5
2 files changed, 39 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
Add log info for umc_v12_0 and smu_v13_0_6.
v2:
Delete redundant logs.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 11 +++
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 6 +-
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm
Add log info for umc_v12_0 and smu_v13_0_6.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 11 +++
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 6 +-
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c| 13 +
3 files
Preparing for asynchronous processing of umc page retirement.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 34 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5
2 files changed, 39 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 8 ++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 66
Add interface to check mca umc status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 12 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 +++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c| 20
Use asynchronous polling to handle umc_v12_0 poisoning.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 143 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 3 +
3 files changed, 120 insertions(+), 31
MCA supports recording umc address information.
V2:
Move err_addr variable from struct ras_err_node to
struct ras_err_info.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 13 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22
Add umc page retirement for umc v12_0.
V2:
1. Changed umc page retirement check condition
to call umc_v12_0_is_uncorrectable_error.
2. Use memset to clear the contents of the umc
error address structure.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 56
smu v13_0_6 supports ecc info by default.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 8
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
Add poison mode check error condition for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c| 20 ++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h| 4 ++--
.../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 4 ++--
3 files changed, 19
Support saving bad pages after gpu ras reset for umc_v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 40 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 35 ++
drivers/gpu/drm
Enable ras for mp0 v13_0_6 sriov
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 7689395e44fd..378478cf9c21 100644
--- a/drivers
Mode1 reset needs to recover mp1 in fatal error case
for mp0 v13_0_10.
v2:
Define a macro to wrap psp function calls.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 ++
drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
Mode1 reset needs to recover mp1 in fatal error case
for mp0 v13_0_10.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++
drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 24 +++-
3 files changed, 27
Fix incorrect vmhub index.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index d04fc0f19a29..c0b588e5d6aa 100644
--- a
Fix printing empty string array.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index c571f0d95994..d04fc0f19a29
not update the same version ras ta.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 20 +++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
index
Add ta initialization failure check condition.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
index
Fatal error occurs in ras poison mode, mode1 reset
is used to recover gpu.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
The link object of mgr->reserved_pages is the blocks
variable in struct amdgpu_vram_reservation, not the
link variable in struct drm_buddy_block.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --
perform mode2 reset for sdma fed error on gfx v11_0_3.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 +
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 14 +-
3 files changed, 25 insertions(+), 2
When testing sdma ib ring fails to detect sdma
hang for sdma fed error, force to perform soft
reset.
V2:
Add poison mode support check for special code
path.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 19 +++
1 file changed, 19 insertions
When testing sdma ib ring fails to detect sdma
hang for sdma fed error, force to perform soft
reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 16
1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
b/drivers/gpu
When gfx ras poison consumption causes gpu reset on gfx v11_0_3,
the sequence of gpu reset is "soft reset -> mode2 reset -> mode1 reset".
If the previous reset fails, fall back to the next reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdg
Add variable to record gpu reset reason.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 6 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
b/drivers/gpu/drm
: recover vram bo from shadow done
[ 390.931067] amdgpu :63:00.0: amdgpu: GPU reset(1) succeeded!
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++
drivers/gpu/drm/amd/amdgpu
Add gfx v11_0_3 fed irq handling for sriov.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 14 +++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c
b/drivers/gpu/drm/amd
Optimize redundant code in umc_v6_7.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 162 +++---
1 file changed, 71 insertions(+), 91 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
index
Optimize redundant code in umc_v8_10
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 31
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 7 +
drivers/gpu/drm/amd/amdgpu/umc_v8_10.c | 197 +---
3 files changed, 115 insertions(+), 120 deletions
Reinit mes ip block during reset on SRIOV.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index
Gfx v11_0_3 supports ras on SRIOV, so need to resume ras
during reset.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b
Enable ras for mp0 v13_0_10 on SRIOV.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 63dfcc98152d
Optimize sdma ras block initialization code for sdma v4_0.
Signed-off-by: YiPeng Chai
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 21 +
1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
b/drivers
Add sdma ras function on sdma v6_0_3.
Signed-off-by: YiPeng Chai
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 35
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 +
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 24
3 files changed
ras block supports ras function.
Signed-off-by: YiPeng Chai
Reviewed-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu
1 - 100 of 124 matches
Mail list logo