On 11/16/2021 2:17 PM, Zhou1, Tao wrote:
[AMD Official Use Only]

Hi Lijo,

Your concern is reasonable, but in fact smu_v13_0_mode1_reset is used only by 
ALDEBARAN currently. I assume the PMFW of new smu v13 ASIC in the future will 
follow this design, otherwise we could move the implementation into xxx_ppt.c.


Actually, this is meant to be a common logic for SMU13 based ASICs. The version check in a common file is not maintainable. I see there is a version check before also, even that is not proper :)

It is better to do it properly when support is added rather than thinking of refactoring with future ASICs.

Thanks,
Lijo

Regards,
Tao

-----Original Message-----
From: Lazar, Lijo <lijo.la...@amd.com>
Sent: Tuesday, November 16, 2021 3:44 PM
To: Zhou1, Tao <tao.zh...@amd.com>; amd-gfx@lists.freedesktop.org; Zhang,
Hawking <hawking.zh...@amd.com>; Clements, John
<john.cleme...@amd.com>; Yang, Stanley <stanley.y...@amd.com>; Quan,
Evan <evan.q...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: support new mode-1 reset interface



On 11/16/2021 12:53 PM, Tao Zhou wrote:
If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
reset message.

Signed-off-by: Tao Zhou <tao.zh...@amd.com>
---
   .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    | 21
++++++++++++++++---
   1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 35145db6eedf..6f3d064a8232 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct
smu_context *smu)

   int smu_v13_0_mode1_reset(struct smu_context *smu)
   {
-   u32 smu_version;
+   u32 smu_version, fatal_err, param;
     int ret = 0;
+   struct amdgpu_device *adev = smu->adev;
+   struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
+
+   fatal_err = 0;
+   param = SMU_RESET_MODE_1;
+
     /*
     * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
     */
     smu_cmn_get_smc_version(smu, NULL, &smu_version);
     if (smu_version < 0x00440700)
             ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset,
NULL);
-   else
-           ret = smu_cmn_send_smc_msg_with_param(smu,
SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
+   else {
+           /* fatal error triggered by ras, PMFW supports the flag
+              from 68.44.0 */
+           if ((smu_version >= 0x00442c00) && ras &&
+               atomic_read(&ras->in_recovery))
+                   fatal_err = 1;
+

  From PMFW version, this looks specific to aldebaran. Since there is version
check as well, the implementation needs to be moved to aldebaran_ppt.c

Thanks,
Lijo

+           param |= (fatal_err << 16);
+           ret = smu_cmn_send_smc_msg_with_param(smu,
+                                   SMU_MSG_GfxDeviceDriverReset,
param, NULL);
+   }

     if (!ret)
             msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);

Reply via email to