[AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: Xie, Patrick <gangliang....@amd.com> > Sent: Friday, June 13, 2025 11:07 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking <hawking.zh...@amd.com>; Zhou1, Tao > <tao.zh...@amd.com>; Xie, Patrick <gangliang....@amd.com> > Subject: [PATCH] drm/amdgpu: refine usage of amdgpu_bad_page_threshold > > when amdgpu_bad_page_threshold == -1 or -2, driver will issue a warning > message > when threshold is reached and continue runtime services. > > Signed-off-by: ganglxie <gangl...@amd.com> > --- > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 21 +++++++++---------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c > index 2ddedf476542..a9246c53bde9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c > @@ -763,18 +763,17 @@ amdgpu_ras_eeprom_update_header(struct > amdgpu_ras_eeprom_control *control) > dev_warn(adev->dev, > "Saved bad pages %d reaches threshold value %d\n", > control->ras_num_bad_pages, ras- > >bad_page_cnt_threshold); > - control->tbl_hdr.header = RAS_TABLE_HDR_BAD; > - if (control->tbl_hdr.version >= RAS_TABLE_VER_V2_1) { > - control->tbl_rai.rma_status = > GPU_RETIRED__ECC_REACH_THRESHOLD; > - control->tbl_rai.health_percent = 0; > - } > - > if ((amdgpu_bad_page_threshold != -1) && > - (amdgpu_bad_page_threshold != -2)) > + (amdgpu_bad_page_threshold != -2)) { > + control->tbl_hdr.header = RAS_TABLE_HDR_BAD; > + if (control->tbl_hdr.version >= RAS_TABLE_VER_V2_1) { > + control->tbl_rai.rma_status = > GPU_RETIRED__ECC_REACH_THRESHOLD; > + control->tbl_rai.health_percent = 0; > + } > ras->is_rma = true; > - > - /* ignore the -ENOTSUPP return value */ > - amdgpu_dpm_send_rma_reason(adev); > + /* ignore the -ENOTSUPP return value */ > + amdgpu_dpm_send_rma_reason(adev); > + } > } > > if (control->tbl_hdr.version >= RAS_TABLE_VER_V2_1) @@ -1509,7 > +1508,7 @@ int amdgpu_ras_eeprom_check(struct amdgpu_ras_eeprom_control > *control) > "RAS records:%d exceed threshold:%d\n", > control->ras_num_bad_pages, ras- > >bad_page_cnt_threshold); > if ((amdgpu_bad_page_threshold == -1) || > - (amdgpu_bad_page_threshold == -2)) { > + (amdgpu_bad_page_threshold == -2)) {
[Tao] the replacement is unnecessary, with this fixed, the patch is: Reviewed-by: Tao Zhou <tao.zh...@amd.com> > res = 0; > dev_warn(adev->dev, > "Please consult AMD Service Action > Guide > (SAG) for appropriate service procedures\n"); > -- > 2.34.1