> 2025年1月8日 17:05,Christian König <christian.koe...@amd.com> 写道: > > Am 08.01.25 um 09:56 schrieb Jiang Liu: >> If error happens before amdgpu_fence_driver_hw_init() gets called during >> device probe, it will trigger a false warning in amdgpu_irq_put() as >> below: >> [ 1209.300996] ------------[ cut here ]------------ >> [ 1209.301061] WARNING: CPU: 48 PID: 293 at >> /tmp/amd.Rc9jFrl7/amd/amdgpu/amdgpu_irq.c:633 amdgpu_irq_put+0x45/0x70 >> [amdgpu] >> [ 1209.301062] Modules linked in: ... >> [ 1209.301093] CPU: 48 PID: 293 Comm: kworker/48:1 Kdump: loaded Tainted: G >> W OE 5.10.134-17.2.al8.x86_64 #1 >> [ 1209.301094] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, >> BIOS 3.0.ES.AL.P.087.05 04/07/2024 >> [ 1209.301095] Workqueue: events work_for_cpu_fn >> [ 1209.301159] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] >> [ 1209.301160] Code: 48 8b 4e 10 48 83 39 00 74 2c 89 d1 48 8d 04 88 8b 08 >> 85 c9 74 14 f0 ff 08 b8 00 00 00 00 74 05 c3 cc cc cc cc e9 8b fd ff ff <0f> >> 0b b8 ea ff ff ff c3 cc cc cc cc b8 ea ff ff ff c3 cc cc cc cc >> [ 1209.301162] RSP: 0018:ffffb08a99c8fd88 EFLAGS: 00010246 >> [ 1209.301162] RAX: ffff9efe1bcbf500 RBX: ffff9efe1cc3e400 RCX: >> 0000000000000000 >> [ 1209.301163] RDX: 0000000000000000 RSI: ffff9efe1cc3b108 RDI: >> ffff9efe1cc00000 >> [ 1209.301163] RBP: ffff9efe1cc10818 R08: 0000000000000001 R09: >> 000000000000000d >> [ 1209.301164] R10: ffffb08a99c8fb48 R11: ffffffffa2068018 R12: >> ffff9efe1cc109d0 >> [ 1209.301164] R13: ffff9efe1cc00010 R14: ffff9efe1cc00000 R15: >> ffff9efe1cc3b108 >> [ 1209.301165] FS: 0000000000000000(0000) GS:ffff9ff9fce00000(0000) >> knlGS:0000000000000000 >> [ 1209.301165] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1209.301165] CR2: 00007fd0f6e860d0 CR3: 0000010092baa003 CR4: >> 0000000002770ee0 >> [ 1209.301166] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [ 1209.301166] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: >> 0000000000000400 >> [ 1209.301167] PKRU: 55555554 >> [ 1209.301167] Call Trace: >> [ 1209.301225] amdgpu_fence_driver_hw_fini+0xda/0x110 [amdgpu] >> [ 1209.301284] amdgpu_device_fini_hw+0xaf/0x200 [amdgpu] >> [ 1209.301342] amdgpu_driver_load_kms+0x7f/0xc0 [amdgpu] >> [ 1209.301400] amdgpu_pci_probe+0x1cd/0x4a0 [amdgpu] >> [ 1209.301401] local_pci_probe+0x40/0xa0 >> [ 1209.301402] work_for_cpu_fn+0x13/0x20 >> [ 1209.301403] process_one_work+0x1ad/0x380 >> [ 1209.301404] worker_thread+0x1c8/0x310 >> [ 1209.301405] ? process_one_work+0x380/0x380 >> [ 1209.301406] kthread+0x118/0x140 >> [ 1209.301407] ? __kthread_bind_mask+0x60/0x60 >> [ 1209.301408] ret_from_fork+0x1f/0x30 >> [ 1209.301410] ---[ end trace 733f120fe2ab13e5 ]--- >> [ 1209.301418] ------------[ cut here ]------------ >> >> Signed-off-by: Jiang Liu <ge...@linux.alibaba.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++++++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> index b5e87b515139..0e41a535e05f 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> @@ -614,9 +614,11 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device >> *adev) >> if (!drm_dev_is_unplugged(adev_to_drm(adev)) && >> ring->fence_drv.irq_src && >> - amdgpu_fence_need_ring_interrupt_restore(ring)) >> + ring->fence_drv.irq_enabled) { >> amdgpu_irq_put(adev, ring->fence_drv.irq_src, >> ring->fence_drv.irq_type); >> + ring->fence_drv.irq_enabled = false; >> + } > > Clearly a NAK, that is exactly what the warning is supposed to warn about. Hi Christian, This is part of a more generic issue related ip block state transition, I will move this patch into the next patch set, which tries to enhance the ip block state machine to avoid false warnings. Thanks, Gerry
> > Regards, > Christian. > >> del_timer_sync(&ring->fence_drv.fallback_timer); >> } >> @@ -693,9 +695,12 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device >> *adev) >> /* enable the interrupt */ >> if (ring->fence_drv.irq_src && >> - amdgpu_fence_need_ring_interrupt_restore(ring)) >> + !ring->fence_drv.irq_enabled && >> + amdgpu_fence_need_ring_interrupt_restore(ring)) { >> amdgpu_irq_get(adev, ring->fence_drv.irq_src, >> ring->fence_drv.irq_type); >> + ring->fence_drv.irq_enabled = true; >> + } >> } >> } >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> index dee5a1b4e572..959d474a0516 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> @@ -118,6 +118,7 @@ struct amdgpu_fence_driver { >> uint32_t sync_seq; >> atomic_t last_seq; >> bool initialized; >> + bool irq_enabled; >> struct amdgpu_irq_src *irq_src; >> unsigned irq_type; >> struct timer_list fallback_timer;