[AMD Official Use Only - General]

Let's handle umc and gfx ras cases in separated patch.

gmc.ecc_irq maps to df mca interrupt, which is enabled by firmware per IFWI 
setting. Check GFX RAS mask is not correct in such case. host driver is not 
privileged to enable/disable the interrupt. As you noticed already the set 
function is dummy one. then just drop amdgpu_irq_put call for gmc.ecc_irq 
should be fine.

gfx.cp_ecc_error_irq is not used in gfx11. Feel free to retire the interrupt 
for gfx11 in another patch (i.e., gfx_v11_0_cp_ecc_error_irq_funcs is not 
needed any more)

Regards,
Hawking

-----Original Message-----
From: Horatio Zhang <hongkun.zh...@amd.com>
Sent: Monday, April 24, 2023 18:37
To: amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun <guchun.c...@amd.com>; Xu, Feifei <feifei...@amd.com>; Zhang, 
Hawking <hawking.zh...@amd.com>; Yao, Longlong <longlong....@amd.com>; Zhang, 
Horatio <hongkun.zh...@amd.com>
Subject: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc 
hw_fini

The call trace occurred when the amdgpu is suspended before the mode1 reset. 
For the IP block that do not support ras features, the relevant interrupt is 
not opened during initialization, but hw_fini forced the close of this 
interrupt, which resulted in amdgpu_irq_enabled check warning.

[  102.873958] Call Trace:
[  102.873959]  <TASK>
[  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [  102.874019]  
gfx_v11_0_suspend+0xe/0x20 [amdgpu] [  102.874072]  
amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.874122]  
amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.874172]  
amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.874223]  
amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.874321]  
amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.874375]  
process_one_work+0x21f/0x3f0 [  102.874377]  worker_thread+0x200/0x3e0 [  
102.874378]  ? process_one_work+0x3f0/0x3f0 [  102.874379]  kthread+0xfd/0x130 
[  102.874380]  ? kthread_complete_and_exit+0x20/0x20
[  102.874381]  ret_from_fork+0x22/0x30

[  102.980303] Call Trace:
[  102.980303]  <TASK>
[  102.980304]  gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [  102.980357]  
gmc_v11_0_suspend+0xe/0x20 [amdgpu] [  102.980409]  
amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.980459]  
amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.980520]  
amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.980573]  
amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.980687]  
amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.980740]  
process_one_work+0x21f/0x3f0 [  102.980741]  worker_thread+0x200/0x3e0 [  
102.980742]  ? process_one_work+0x3f0/0x3f0 [  102.980743]  kthread+0xfd/0x130 
[  102.980743]  ? kthread_complete_and_exit+0x20/0x20
[  102.980744]  ret_from_fork+0x22/0x30

Signed-off-by: Horatio Zhang <hongkun.zh...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-  
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 543af07ff102..0f6b037558bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -4483,7 +4483,8 @@ static int gfx_v11_0_hw_fini(void *handle)
        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
        int r;

-       amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
+               amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 3828ca95899f..25f466c26d18 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -951,7 +951,8 @@ static int gmc_v11_0_hw_fini(void *handle)
                return 0;
        }

-       amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
+               amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
        amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
        gmc_v11_0_gart_disable(adev);

--
2.34.1

Reply via email to