[AMD Official Use Only]

Via ras_ctrl sys node one uncorrectable error injection on Sienna Cichlid, two 
interrupts will be triggered.
I was informed the two interrupts are as expected since when error address is 
not 64byte aligned, one 64Byte SDP request will be split to two 32Byte request 
in UMC and sent to dram

Then the second interrupt handling will read the garbage data in err_data.
And the consequence is that ue counter increased by 2, and page at 0x0 address 
will be saved unexpectedly.

Best regards,
Jiawei  

-----Original Message-----
From: Zhou1, Tao <tao.zh...@amd.com> 
Sent: Thursday, January 6, 2022 6:05 PM
To: Gu, JiaWei (Will) <jiawei...@amd.com>; amd-gfx@lists.freedesktop.org; 
Clements, John <john.cleme...@amd.com>; Yang, Stanley <stanley.y...@amd.com>; 
Deng, Emily <emily.d...@amd.com>
Cc: Gu, JiaWei (Will) <jiawei...@amd.com>
Subject: RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage

[AMD Official Use Only]

Reviewed-by: Tao Zhou <tao.zh...@amd.com>

May I know how do you reproduce the issue?

> -----Original Message-----
> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of 
> Jiawei Gu
> Sent: Thursday, January 6, 2022 5:17 PM
> To: amd-gfx@lists.freedesktop.org; Clements, John 
> <john.cleme...@amd.com>; Yang, Stanley <stanley.y...@amd.com>; Deng, 
> Emily <emily.d...@amd.com>
> Cc: Gu, JiaWei (Will) <jiawei...@amd.com>
> Subject: [PATCH] drm/amdgpu: Clear garbage data in err_data before 
> usage
> 
> Memory of err_data should be cleaned before usage when there're 
> multiple entry in ras ih.
> Otherwise garbage data from last loop will be used.
> 
> Signed-off-by: Jiawei Gu <jiawei...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 31bad1a20ed0..3f5bf5780ebf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct
> ras_manager *obj)
>                               /* Let IP handle its data, maybe we need get 
> the output
>                                * from the callback to udpate the error 
> type/count, etc
>                                */
> +                             memset(&err_data, 0, sizeof(err_data));
>                               ret = data->cb(obj->adev, &err_data, &entry);
>                               /* ue will trigger an interrupt, and in that 
> case
>                                * we need do a reset to recovery the whole 
> system.
> --
> 2.17.1

Reply via email to