[PATCH 2/2] drm/amdgpu: refine bad page loading when in the same nps mode

2025-07-11 Thread ganglxie
when loading bad page in the same nps mode, need to set the other fields fields in eeprom records manually besides retired_page Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: refine eeprom data check

2025-07-11 Thread ganglxie
add eeprom data checksum check before driver unload. reset eeprom and save correct data to eeprom when check failed Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 28 +++ .../gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: refine bad page loading when in the same nps mode

2025-07-11 Thread ganglxie
when loading bad page in the same nps mode, need to set the other fields in eeprom records manually besides retired_page Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: refine eeprom data check

2025-07-11 Thread ganglxie
add eeprom data checksum check before driver unload. reset eeprom and save correct data to eeprom when check failed Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 28 +++ .../gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: refine bad page loading when in the same nps mode

2025-07-10 Thread ganglxie
when loading bad page in the same nps mode, need to set the other fields in eeprom records manually besides retired_page Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: refine eeprom data check

2025-07-10 Thread ganglxie
add eeprom data checksum check before driver unload. reset eeprom and save correct data to eeprom when check failed Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 25 +++ .../gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: refine eeprom data check

2025-07-09 Thread ganglxie
add eeprom data checksum check after data writing, after gpu reset, and before driver unload. reset eeprom and save correct data to eeprom when check failed Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 13 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

[PATCH 2/2] drm/amdgpu: refine bad page loading when in the same nps mode

2025-07-09 Thread ganglxie
when loading bad page in the same nps mode, need to set the other fields in eeprom records manually besides retired_page Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: refine bad page loading when in the same nps mode

2025-07-07 Thread ganglxie
when loading bad page in the same nps mode, need to set the other fields in eeprom records manually besides retired_page Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amdgpu: refine eeprom data check

2025-07-07 Thread ganglxie
add eeprom data checksum check after data writing, before gpu reset, and before driver unload reset eeprom and save correct data to eeprom when check failed Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1

[PATCH] drm/amdgpu: refine ras error injection when eeprom initialization failed

2025-06-27 Thread ganglxie
when eeprom initialization failed, we still support ras error injection, and reserve bad pages, but do not save bad pages to eeprom Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22 ++- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h| 2 ++ 2 files

[PATCH] drm/amdgpu: refine usage of amdgpu_bad_page_threshold

2025-06-12 Thread ganglxie
when amdgpu_bad_page_threshold == -1 or -2, driver will issue a warning message when threshold is reached and continue runtime services. Signed-off-by: ganglxie --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff

[PATCH] drm/amdgpu: refine usage of amdgpu_bad_page_threshold

2025-06-12 Thread ganglxie
when amdgpu_bad_page_threshold == -1 or -2, driver will issue a warning message when threshold is reached and continue runtime services. Signed-off-by: ganglxie --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 21 +-- 1 file changed, 10 insertions(+), 11 deletions(-) diff

[PATCH] drm/amdgpu: change usage definition of amdgpu_bad_page_threshold

2025-06-12 Thread ganglxie
when amdgpu_bad_page_threshold == -1, driver won't write BADG and RMA when amdgpu_bad_page_threshold == -2, driver will write BADG and RMA Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 26 -

[PATCH] drm/amdgpu: clear pa and mca record counter when resetting eeprom

2025-06-04 Thread ganglxie
clear pa and mca record counter when resetting eeprom, so that ras_num_bad_pages can be calculated correctly Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers

[PATCH 2/2] drm/amdgpu: Get mca for old eeprom records

2025-05-22 Thread ganglxie
after getting mca for old eeprom records with 'address==0', it can be correctly parsed under none-nps1, or it will be dropped. Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgp

[PATCH 1/2] drm/amdgpu: handle old RAS eeprom data in non-nps1 mode

2025-05-22 Thread ganglxie
Get MCA address from PA in nps1, then convert MCA address to PA in specific nps mode. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 23 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 ++

[PATCH] Refine RAS bad page records counting and parsing in eeprom V3

2025-04-29 Thread ganglxie
there is only MCA records in V3, no need to care about PA records. recalculate the value of ras_num_bad_pages when parsing failed and go on with the left records instead of quit. Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 61 +++-- 1 file changed

[PATCH] Add support for leagcy records in eeprom format V3

2025-04-29 Thread ganglxie
After eeprom records format upgrades to V3, records that have 'address == 0' should be supported in NPS1 Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 73 - 1 file changed, 48 insertions(+), 25 deletions(-) diff --git a/drivers/g

[PATCH] drm/amdgpu: Save PA of bad pages for old asics

2025-03-11 Thread ganglxie
for old asics that do not support mca translating, we just save PA for them Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 24 --- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 9 +-- 2 files changed, 28 insertions(+), 5 deletions(-) diff

[PATCH 3/3] drm/amdgpu: Change page/record number calculation based on nps

2025-02-20 Thread ganglxie
save only one record to save eeprom space,and bad_page_num = pa_rec_num + mca_rec_num*16 Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 49 +-- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 17 +++ .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h

[PATCH 2/3] drm/amdgpu: Refine bad page adding

2025-02-20 Thread ganglxie
bad page adding can be simpler with nps info Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 196 +--- 1 file changed, 105 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c

[PATCH 1/3] drm/amdgpu: Save nps to eeprom

2025-02-20 Thread ganglxie
nps info saved together with bad page makes bad page parsing more efficient Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h| 7 +++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a

[PATCH] drm/amdgpu: Save nps to eeprom and refine code

2025-02-19 Thread ganglxie
add nps info into eeprom records, and refine adding bad page logic based on nps info. Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 244 +- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 25 +- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h| 20

[PATCH] drm/amdgpu: Save nps to eeprom and refine code add nps info into eeprom records, and refine bad page adding logic based on nps info.

2025-02-19 Thread ganglxie
Signed-off-by: ganglxie --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 239 +- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 25 +- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h| 20 +- drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 7 + 4 files changed, 148