[AMD Official Use Only]

Dear Paul,

Comment inline.

Regards,
Zafar

>-----Original Message-----
>From: Paul Menzel <pmen...@molgen.mpg.de>
>Sent: Monday, March 28, 2022 3:08 PM
>To: Ziya, Mohammad zafar <mohammadzafar.z...@amd.com>; Zhou1, Tao
><tao.zh...@amd.com>
>Cc: Lazar, Lijo <lijo.la...@amd.com>; amd-gfx@lists.freedesktop.org; Zhang,
>Hawking <hawking.zh...@amd.com>
>Subject: Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support
>
>
>Dear Mohammad,
>
>
>Am 28.03.22 um 10:47 schrieb Ziya, Mohammad zafar:
>
>[…]
>
>>> -----Original Message-----
>>> From: Paul Menzel <pmen...@molgen.mpg.de>
>>> Sent: Monday, March 28, 2022 1:39 PM
>
>>> Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:
>>>
>>> […]
>>>
>>>>> From: Paul Menzel <pmen...@molgen.mpg.de>
>>>>> Sent: Monday, March 28, 2022 1:22 PM
>
>>>>> Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
>>>>>> -----Original Message-----
>>>>>> From: Ziya, Mohammad zafar <mohammadzafar.z...@amd.com>
>>>>>> Sent: Monday, March 28, 2022 2:25 PM
>>>
>>> […]
>>>
>>>>>> +static uint32_t vcn_v2_6_query_poison_by_instance(struct
>amdgpu_device *adev,
>>>>>> +                        uint32_t instance, uint32_t sub_block) {
>>>>>> +        uint32_t poison_stat = 0, reg_value = 0;
>>>>>> +
>>>>>> +        switch (sub_block) {
>>>>>> +        case AMDGPU_VCN_V2_6_VCPU_VCODEC:
>>>>>> +                reg_value = RREG32_SOC15(VCN, instance,
>mmUVD_RAS_VCPU_VCODEC_STATUS);
>>>>>> +                poison_stat = REG_GET_FIELD(reg_value,
>UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
>>>>>> +                break;
>>>>>> +        default:
>>>>>> +                break;
>>>>>> +        };
>>>>>> +
>>>>>> +        if (poison_stat)
>>>>>> +                dev_info(adev->dev, "Poison detected in VCN%d,
>sub_block%d\n",
>>>>>> +                        instance, sub_block);
>>>>>
>>>>> What should a user do with that information? Faulty hardware, …?
>>>>
>>>> [Mohammad]: This message will help to identify the faulty hardware,
>>>> the hardware ID will also log along with poison, help to identify
>>>> among multiple hardware installed on the system.
>>>
>>> Thank you for clarifying. If it’s indeed faulty hardware, should the
>>> log level be increased to be an error? Keep in mind, that normal
>>> ignorant users (like me) are reading the message, and it’d be great
>>> to guide them a little. They do not know what “Poison“ means I guess.
>Maybe:
>>>
>>> A hardware corruption was found indicating the device might be faulty.
>>> (Poison detected in VCN%d, sub_block%d)\n
>>>
>>> (Keep in mind, I do not know anything about RAS.)
>>
>> [Mohammad]: It is an error condition, but this is just an information
>> message which could have been ignored as well because VCN just
>> consumed the poison, not created.
>
>Sorry, I have never seen these message in `dmesg`, so could you give an
>example log please, what the user would see?
>

[Mohammad]: [  231.181316] amdgpu 0000:8a:00.0: amdgpu: Poison detected in 
VCN0, sub_block0

Sample message from amdgpu " [  237.013029] amdgpu 0000:8a:00.0: amdgpu: HDCP: 
optional hdcp ta ucode is not available "
>
>Kind regards,
>
>Paul

Reply via email to