[AMD Official Use Only] Reviewed-by: Hawking Zhang <hawking.zh...@amd.com>
Regards, Hawking -----Original Message----- From: Stanley.Yang <stanley.y...@amd.com> Sent: Tuesday, December 7, 2021 14:40 To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <hawking.zh...@amd.com>; Clements, John <john.cleme...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Li, Candice <candice...@amd.com>; Chai, Thomas <yipeng.c...@amd.com> Cc: Yang, Stanley <stanley.y...@amd.com> Subject: [PATCH Review 1/1] drm/amdgpu: skip umc ras error count harvest remove in recovery stat check, skip umc ras err cnt harvest in amdgpu_ras_log_on_err_counter Signed-off-by: Stanley.Yang <stanley.y...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1043d41b6807..a95d200adff9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -897,11 +897,6 @@ static void amdgpu_ras_get_ecc_info(struct amdgpu_device *adev, struct ras_err_d struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); int ret = 0; - /* skip get ecc info during gpu recovery */ - if (atomic_read(&ras->in_recovery) == 1 && - adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2)) - return; - /* * choosing right query method according to * whether smu support query error information @@ -1752,6 +1747,16 @@ static void amdgpu_ras_log_on_err_counter(struct amdgpu_device *adev) if (info.head.block == AMDGPU_RAS_BLOCK__PCIE_BIF) continue; + /* + * this is a workaround for aldebaran, skip send msg to + * smu to get ecc_info table due to smu handle get ecc + * info table failed temporarily. + * should be removed until smu fix handle ecc_info table. + */ + if ((info.head.block == AMDGPU_RAS_BLOCK__UMC) && + (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2))) + continue; + amdgpu_ras_query_error_status(adev, &info); } } -- 2.17.1