[AMD Official Use Only]


> -----Original Message-----
> From: Tuikov, Luben <luben.tui...@amd.com>
> Sent: Wednesday, October 20, 2021 6:01 PM
> To: Kuehling, Felix <felix.kuehl...@amd.com>; Russell, Kent 
> <kent.russ...@amd.com>;
> amd-gfx@lists.freedesktop.org
> Cc: Joshi, Mukul <mukul.jo...@amd.com>
> Subject: Re: [PATCH 3/3] drm/amdgpu: Implement bad_page_threshold = -2 case
> 
> On 2021-10-20 17:54, Felix Kuehling wrote:
> > On 2021-10-20 12:35 p.m., Kent Russell wrote:
> >> If the bad_page_threshold kernel parameter is set to -2,
> >> continue to post the GPU. Print a warning to dmesg that this action has
> >> been done, and that page retirement will obviously not work for said GPU
> > I'd squash patch 2 and 3. The squashed patch is
> >
> > Acked-by: Felix Kuehling <felix.kuehl...@amd.com>
> 
> I was just thinking the same thing. Keep the title and text of patch 2 and 
> add the description
> of 3 to 2. With that done:
> 
> Reviewed-by: Luben Tuikov <luben.tui...@amd.com>

Sounds good, thanks. I was on the fence about combining them from when I had 
the separate kernel param, and it was easier to squash it at review time than 
to separate it. I'll still need to work on patch #1 but thanks for the reviews 
here!

 Kent

> 
> Regards,
> Luben
> 
> >
> >
> >> Cc: Luben Tuikov <luben.tui...@amd.com>
> >> Cc: Mukul Joshi <mukul.jo...@amd.com>
> >> Signed-off-by: Kent Russell <kent.russ...@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 13 +++++++++----
> >>   1 file changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> >> index 1ede0f0d6f55..31852330c1db 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> >> @@ -1115,11 +1115,16 @@ int amdgpu_ras_eeprom_init(struct
> amdgpu_ras_eeprom_control *control,
> >>                    res = amdgpu_ras_eeprom_correct_header_tag(control,
> >>                                                               
> >> RAS_TABLE_HDR_VAL);
> >>            } else {
> >> -                  *exceed_err_limit = true;
> >> -                  dev_err(adev->dev,
> >> -                          "RAS records:%d exceed threshold:%d, "
> >> -                          "GPU will not be initialized. Replace this GPU 
> >> or increase the
> threshold",
> >> +                  dev_err(adev->dev, "RAS records:%d exceed threshold:%d",
> >>                            control->ras_num_recs, 
> >> ras->bad_page_cnt_threshold);
> >> +                  if (amdgpu_bad_page_threshold == -2) {
> >> +                          dev_warn(adev->dev, "GPU will be initialized 
> >> due to
> bad_page_threshold = -2.");
> >> +                          dev_warn(adev->dev, "Page retirement will not 
> >> work for
> this GPU in this state.");
> >> +                          res = 0;
> >> +                  } else {
> >> +                          *exceed_err_limit = true;
> >> +                          dev_err(adev->dev, "GPU will not be 
> >> initialized. Replace this
> GPU or increase the threshold.");
> >> +                  }
> >>            }
> >>    } else {
> >>            DRM_INFO("Creating a new EEPROM table");

Reply via email to