On 3/13/26 08:21, Sunil Khatri wrote:
> An extra dma_fence_put() can drop the last reference to a fence while it is
> still attached to a dma_resv object. This frees the fence prematurely via
> dma_fence_release() while other users still hold the pointer.
> 
> Later accesses through dma_resv iteration may then operate on the freed
> fence object, leading to refcount underflow warnings and potential hangs
> when walking reservation fences.
> 
> Fix this by correcting the fence lifetime so the dma_resv object retains a
> valid reference until it is done with the fence.
> 
> [   31.133803] refcount_t: underflow; use-after-free.
> [   31.133805] WARNING: lib/refcount.c:28 at 
> refcount_warn_saturate+0x58/0x90, CPU#18: kworker/u96:1/188
> [   31.133815] Modules linked in: snd_seq_dummy snd_hrtimer qrtr binfmt_misc 
> nls_iso8859_1 snd_hda_codec_alc882 snd_hda_codec_realtek_lib 
> snd_hda_codec_generic snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_intel 
> amd_atl snd_hda_codec intel_rapl_msr intel_rapl_common amdgpu snd_hda_core 
> snd_intel_dspcfg amdxcp snd_intel_sdw_acpi drm_panel_backlight_quirks 
> snd_hwdep gpu_sched drm_buddy snd_pcm drm_ttm_helper ttm drm_exec 
> drm_suballoc_helper snd_seq_midi drm_client_lib snd_seq_midi_event 
> drm_display_helper snd_rawmidi cec snd_seq edac_mce_amd ghash_clmulni_intel 
> snd_seq_device aesni_intel rc_core drm_kms_helper gigabyte_wmi snd_timer 
> wmi_bmof rapl k10temp video i2c_piix4 snd i2c_smbus input_leds soundcore 
> joydev ccp mac_hid sch_fq_codel msr parport_pc ppdev lp parport drm 
> efi_pstore nfnetlink dmi_sysfs autofs4 hid_generic usbhid hid nvme igb ahci 
> i2c_algo_bit dca libahci nvme_core wmi
> [   31.133932] CPU: 18 UID: 0 PID: 188 Comm: kworker/u96:1 Not tainted 
> 6.19.0-amd-staging-drm-next #28 PREEMPT(voluntary)
> [   31.133937] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS 
> ELITE/X570 AORUS ELITE, BIOS F37c 05/12/2022
> [   31.133940] Workqueue: sdma1 drm_sched_run_job_work [gpu_sched]
> [   31.133951] RIP: 0010:refcount_warn_saturate+0x58/0x90
> [   31.133955] Code: 74 2f 83 fe 01 75 38 48 8d 3d a4 2c 91 01 67 48 0f b9 3a 
> eb 36 48 8d 3d a6 2c 91 01 67 48 0f b9 3a eb 28 48 8d 3d a8 2c 91 01 <67> 48 
> 0f b9 3a eb 1a 48 8d 3d aa 2c 91 01 67 48 0f b9 3a eb 0c 48
> [   31.133959] RSP: 0018:ffffca16807dfd68 EFLAGS: 00010246
> [   31.133962] RAX: ffff89e988f05600 RBX: 0000000000000000 RCX: 
> 0000000000000000
> [   31.133965] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
> ffffffffa1fd2f30
> [   31.133967] RBP: ffffca16807dfd68 R08: 0000000000000000 R09: 
> 0000000000000000
> [   31.133969] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffff89e98edf1308
> [   31.133971] R13: ffff89e9d3001380 R14: ffff89e9dab5f800 R15: 
> ffff89e9dab5f880
> [   31.133974] FS:  0000000000000000(0000) GS:ffff89ed0cc3e000(0000) 
> knlGS:0000000000000000
> [   31.133976] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   31.133979] CR2: 00007f3050081c28 CR3: 0000000117f06000 CR4: 
> 0000000000350ef0
> [   31.133982] Call Trace:
> [   31.133985]  <TASK>
> [   31.133989]  drm_sched_entity_pop_job+0x414/0x420 [gpu_sched]
> [   31.133997]  drm_sched_run_job_work+0x15f/0x3c0 [gpu_sched]
> [   31.134003]  process_scheduled_works+0x1f0/0x450
> [   31.134011]  worker_thread+0x27f/0x370
> [   31.134016]  kthread+0x1ed/0x210
> [   31.134020]  ? __pfx_worker_thread+0x10/0x10
> [   31.134023]  ? srso_return_thunk+0x5/0x5f
> [   31.134027]  ? __pfx_kthread+0x10/0x10
> [   31.134031]  ret_from_fork+0x10f/0x1b0
> [   31.134035]  ? __pfx_kthread+0x10/0x10
> [   31.134039]  ret_from_fork_asm+0x1a/0x30
> [   31.134047]  </TASK>
> [   31.134049] ---[ end trace 0000000000000000 ]---
> ...
> [   56.544104] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! 
> [glxgears:cs0:3483]
> [   56.544108] Modules linked in: snd_seq_dummy snd_hrtimer qrtr binfmt_misc 
> nls_iso8859_1 snd_hda_codec_alc882 snd_hda_codec_realtek_lib 
> snd_hda_codec_generic snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_intel 
> amd_atl snd_hda_codec intel_rapl_msr intel_rapl_common amdgpu snd_hda_core 
> snd_intel_dspcfg amdxcp snd_intel_sdw_acpi drm_panel_backlight_quirks 
> snd_hwdep gpu_sched drm_buddy snd_pcm drm_ttm_helper ttm drm_exec 
> drm_suballoc_helper snd_seq_midi drm_client_lib snd_seq_midi_event 
> drm_display_helper snd_rawmidi cec snd_seq edac_mce_amd ghash_clmulni_intel 
> snd_seq_device aesni_intel rc_core drm_kms_helper gigabyte_wmi snd_timer 
> wmi_bmof rapl k10temp video i2c_piix4 snd i2c_smbus input_leds soundcore 
> joydev ccp mac_hid sch_fq_codel msr parport_pc ppdev lp parport drm 
> efi_pstore nfnetlink dmi_sysfs autofs4 hid_generic usbhid hid nvme igb ahci 
> i2c_algo_bit dca libahci nvme_core wmi
> [   56.544166] CPU: 9 UID: 0 PID: 3483 Comm: glxgears:cs0 Tainted: G        W 
>           6.19.0-amd-staging-drm-next #28 PREEMPT(voluntary)
> [   56.544170] Tainted: [W]=WARN
> [   56.544171] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS 
> ELITE/X570 AORUS ELITE, BIOS F37c 05/12/2022
> [   56.544172] RIP: 0010:dma_resv_iter_walk_unlocked+0x4e/0x180
> [   56.544179] Code: 45 31 ed eb 0e 41 8b 46 08 41 3b 46 18 0f 83 23 01 00 00 
> 49 8b 46 10 48 85 c0 74 20 48 8d 78 38 b9 ff ff ff ff f0 0f c1 48 38 <83> f9 
> 01 75 07 e8 78 ce ff ff eb 06 0f 8c e3 00 00 00 41 8b 46 1c
> [   56.544180] RSP: 0018:ffffca16865bb870 EFLAGS: 00000217
> [   56.544182] RAX: ffff89e997f38d80 RBX: 0000000000000005 RCX: 
> 0000000000000006
> [   56.544183] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
> ffff89e997f38db8
> [   56.544184] RBP: ffffca16865bb898 R08: 0000000000000000 R09: 
> 0000000000000000
> [   56.544185] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffffca16865bb8c0
> [   56.544186] R13: 0000000000000000 R14: ffffca16865bb8a8 R15: 
> ffff89e997f38d80
> [   56.544187] FS:  00007f8f8d3ff6c0(0000) GS:ffff89ed0c9fe000(0000) 
> knlGS:0000000000000000
> [   56.544189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.544190] CR2: 00007f8f9b735020 CR3: 0000000117f06000 CR4: 
> 0000000000350ef0
> [   56.544191] Call Trace:
> [   56.544193]  <TASK>
> [   56.544197]  dma_resv_wait_timeout+0x55/0x190
> [   56.544202]  amdgpu_bo_kmap+0x3a/0xa0 [amdgpu]
> [   56.544502]  amdgpu_userq_fence_read_wptr+0x130/0x2e0 [amdgpu]
> [   56.544670]  amdgpu_userq_signal_ioctl+0x1f6/0x5e0 [amdgpu]
> [   56.544847]  ? srso_return_thunk+0x5/0x5f
> [   56.544851]  ? amdgpu_userq_wait_ioctl+0xab7/0xb80 [amdgpu]
> [   56.545021]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
> [   56.545190]  drm_ioctl_kernel+0xd9/0x150 [drm]
> [   56.545222]  drm_ioctl+0x29a/0x4a0 [drm]
> [   56.545245]  ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
> [   56.545422]  ? srso_return_thunk+0x5/0x5f
> [   56.545426]  amdgpu_drm_ioctl+0x46/0x90 [amdgpu]
> [   56.545595]  __se_sys_ioctl+0x73/0xd0
> [   56.545600]  __x64_sys_ioctl+0x1d/0x30
> [   56.545602]  x64_sys_call+0x1715/0x2d00
> [   56.545604]  do_syscall_64+0x7c/0x6a0
> [   56.545608]  ? __pfx_amdgpu_userq_wait_ioctl+0x10/0x10 [amdgpu]
> [   56.545778]  ? srso_return_thunk+0x5/0x5f
> [   56.545781]  ? amdgpu_drm_ioctl+0x6c/0x90 [amdgpu]
> [   56.545950]  ? srso_return_thunk+0x5/0x5f
> 
> Signed-off-by: Sunil Khatri <[email protected]>

Reviewed-by: Christian König <[email protected]> for the entire series.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 146ca6d7f4f5..442c08b69f7c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -882,12 +882,9 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void 
> *data,
>                                * be good for now
>                                */
>                               r = dma_fence_wait(fences[i], true);
> -                             if (r) {
> -                                     dma_fence_put(fences[i]);
> +                             if (r)
>                                       goto free_fences;
> -                             }
>  
> -                             dma_fence_put(fences[i]);
>                               continue;
>                       }
>  
> @@ -909,7 +906,6 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void 
> *data,
>                       fence_info[cnt].va = fence_drv->va;
>                       fence_info[cnt].value = fences[i]->seqno;
>  
> -                     dma_fence_put(fences[i]);
>                       /* Increment the actual userq fence count */
>                       cnt++;
>               }

Reply via email to