Hey,
Den 2026-03-17 kl. 16:43, skrev Hellstrom, Thomas:
> On Tue, 2026-03-17 at 16:39 +0100, Maarten Lankhorst wrote:
>>
>>
>> Den 2026-03-17 kl. 16:26, skrev Thomas Hellström:
>>> On Fri, 2026-03-13 at 16:17 +0100, Maarten Lankhorst wrote:
>>>> When trying to do a rather aggressive test of igt's
>>>> "xe_module_load
>>>> --r reload" with a full desktop environment and game running I
>>>> noticed
>>>> a few OOPSes when dereferencing freed pointers, related to
>>>> framebuffers and property blobs after the compositor exits.
>>>>
>>>> Solve this by guarding the freeing in drm_file with
>>>> drm_dev_enter/exit,
>>>> and immediately put the references from struct drm_file objects
>>>> during
>>>> drm_dev_unplug().
>>>>
>>>> Related warnings for framebuffers on the subtest:
>>>> [ 739.713076] ------------[ cut here ]------------
>>>> WARN_ON(!list_empty(&dev->mode_config.fb_list))
>>>> [ 739.713079] WARNING: drivers/gpu/drm/drm_mode_config.c:584 at
>>>> drm_mode_config_cleanup+0x30b/0x320 [drm], CPU#12:
>>>> xe_module_load/13145
>>>> ....
>>>> [ 739.713328] Call Trace:
>>>> [ 739.713330] <TASK>
>>>> [ 739.713335] ? intel_pmdemand_destroy_state+0x11/0x20 [xe]
>>>> [ 739.713574] ? intel_atomic_global_obj_cleanup+0xe4/0x1a0 [xe]
>>>> [ 739.713794] intel_display_driver_remove_noirq+0x51/0xb0 [xe]
>>>> [ 739.714041] xe_display_fini_early+0x33/0x50 [xe]
>>>> [ 739.714284] devm_action_release+0xf/0x20
>>>> [ 739.714294] devres_release_all+0xad/0xf0
>>>> [ 739.714301] device_unbind_cleanup+0x12/0xa0
>>>> [ 739.714305] device_release_driver_internal+0x1b7/0x210
>>>> [ 739.714311] device_driver_detach+0x14/0x20
>>>> [ 739.714315] unbind_store+0xa6/0xb0
>>>> [ 739.714319] drv_attr_store+0x21/0x30
>>>> [ 739.714322] sysfs_kf_write+0x48/0x60
>>>> [ 739.714328] kernfs_fop_write_iter+0x16b/0x240
>>>> [ 739.714333] vfs_write+0x266/0x520
>>>> [ 739.714341] ksys_write+0x72/0xe0
>>>> [ 739.714345] __x64_sys_write+0x19/0x20
>>>> [ 739.714347] x64_sys_call+0xa15/0xa30
>>>> [ 739.714355] do_syscall_64+0xd8/0xab0
>>>> [ 739.714361] entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>
>>>> and
>>>>
>>>> [ 739.714459] ------------[ cut here ]------------
>>>> [ 739.714461] xe 0000:67:00.0: [drm]
>>>> drm_WARN_ON(!list_empty(&fb-
>>>>> filp_head))
>>>> [ 739.714464] WARNING: drivers/gpu/drm/drm_framebuffer.c:833 at
>>>> drm_framebuffer_free+0x6c/0x90 [drm], CPU#12:
>>>> xe_module_load/13145
>>>> [ 739.714715] RIP: 0010:drm_framebuffer_free+0x7a/0x90 [drm]
>>>> ...
>>>> [ 739.714869] Call Trace:
>>>> [ 739.714871] <TASK>
>>>> [ 739.714876] drm_mode_config_cleanup+0x26a/0x320 [drm]
>>>> [ 739.714998] ? __drm_printfn_seq_file+0x20/0x20 [drm]
>>>> [ 739.715115] ? drm_mode_config_cleanup+0x207/0x320 [drm]
>>>> [ 739.715235] intel_display_driver_remove_noirq+0x51/0xb0 [xe]
>>>> [ 739.715576] xe_display_fini_early+0x33/0x50 [xe]
>>>> [ 739.715821] devm_action_release+0xf/0x20
>>>> [ 739.715828] devres_release_all+0xad/0xf0
>>>> [ 739.715843] device_unbind_cleanup+0x12/0xa0
>>>> [ 739.715850] device_release_driver_internal+0x1b7/0x210
>>>> [ 739.715856] device_driver_detach+0x14/0x20
>>>> [ 739.715860] unbind_store+0xa6/0xb0
>>>> [ 739.715865] drv_attr_store+0x21/0x30
>>>> [ 739.715868] sysfs_kf_write+0x48/0x60
>>>> [ 739.715873] kernfs_fop_write_iter+0x16b/0x240
>>>> [ 739.715878] vfs_write+0x266/0x520
>>>> [ 739.715886] ksys_write+0x72/0xe0
>>>> [ 739.715890] __x64_sys_write+0x19/0x20
>>>> [ 739.715893] x64_sys_call+0xa15/0xa30
>>>> [ 739.715900] do_syscall_64+0xd8/0xab0
>>>> [ 739.715905] entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>
>>>> and then finally file close blows up:
>>>>
>>>> [ 743.186530] Oops: general protection fault, probably for non-
>>>> canonical address 0xdead000000000122: 0000 [#1] SMP
>>>> [ 743.186535] CPU: 3 UID: 1000 PID: 3453 Comm: kwin_wayland
>>>> Tainted:
>>>> G W 7.0.0-rc1-valkyria+ #110 PREEMPT_{RT,(lazy)}
>>>> [ 743.186537] Tainted: [W]=WARN
>>>> [ 743.186538] Hardware name: Gigabyte Technology Co., Ltd. X299
>>>> AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021
>>>> [ 743.186539] RIP: 0010:drm_framebuffer_cleanup+0x55/0xc0 [drm]
>>>> [ 743.186588] Code: d8 72 73 0f b6 42 05 ff c3 39 c3 72 e8 49 8d
>>>> bd
>>>> 50 07 00 00 31 f6 e8 3a 80 d3 e1 49 8b 44 24 10 49 8d 7c 24 08 49
>>>> 8b
>>>> 54 24 08 <48> 3b 38 0f 85 95 7f 02 00 48 3b 7a 08 0f 85 8b 7f 02
>>>> 00
>>>> 48 89 42
>>>> [ 743.186589] RSP: 0018:ffffc900085e3cf8 EFLAGS: 00010202
>>>> [ 743.186591] RAX: dead000000000122 RBX: 0000000000000001 RCX:
>>>> ffffffff8217ed03
>>>> [ 743.186592] RDX: dead000000000100 RSI: 0000000000000000 RDI:
>>>> ffff88814675ba08
>>>> [ 743.186593] RBP: ffffc900085e3d10 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [ 743.186593] R10: 0000000000000000 R11: 0000000000000000 R12:
>>>> ffff88814675ba00
>>>> [ 743.186594] R13: ffff88810d778000 R14: ffff888119f6dca0 R15:
>>>> ffff88810c660bb0
>>>> [ 743.186595] FS: 00007ff377d21280(0000)
>>>> GS:ffff888cec3f8000(0000)
>>>> knlGS:0000000000000000
>>>> [ 743.186596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 743.186596] CR2: 000055690b55e000 CR3: 0000000113586003 CR4:
>>>> 00000000003706f0
>>>> [ 743.186597] Call Trace:
>>>> [ 743.186598] <TASK>
>>>> [ 743.186603] intel_user_framebuffer_destroy+0x12/0x90 [xe]
>>>> [ 743.186722] drm_framebuffer_free+0x3a/0x90 [drm]
>>>> [ 743.186750] ? trace_hardirqs_on+0x5f/0x120
>>>> [ 743.186754] drm_mode_object_put+0x51/0x70 [drm]
>>>> [ 743.186786] drm_fb_release+0x105/0x190 [drm]
>>>> [ 743.186812] ? rt_mutex_slowunlock+0x3aa/0x410
>>>> [ 743.186817] ? rt_spin_lock+0xea/0x1b0
>>>> [ 743.186819] drm_file_free+0x1e0/0x2c0 [drm]
>>>> [ 743.186843] drm_release_noglobal+0x91/0xf0 [drm]
>>>> [ 743.186865] __fput+0x100/0x2e0
>>>> [ 743.186869] fput_close_sync+0x40/0xa0
>>>> [ 743.186870] __x64_sys_close+0x3e/0x80
>>>> [ 743.186873] x64_sys_call+0xa07/0xa30
>>>> [ 743.186879] do_syscall_64+0xd8/0xab0
>>>> [ 743.186881] entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>> [ 743.186882] RIP: 0033:0x7ff37e567732
>>>> [ 743.186884] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89
>>>> d7
>>>> 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c
>>>> 24
>>>> 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa
>>>> 55
>>>> bf 01 00
>>>> [ 743.186885] RSP: 002b:00007ffc818169a8 EFLAGS: 00000246
>>>> ORIG_RAX:
>>>> 0000000000000003
>>>> [ 743.186886] RAX: ffffffffffffffda RBX: 00007ffc81816a30 RCX:
>>>> 00007ff37e567732
>>>> [ 743.186887] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>>> 0000000000000012
>>>> [ 743.186888] RBP: 00007ffc818169d0 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [ 743.186889] R10: 0000000000000000 R11: 0000000000000246 R12:
>>>> 000055d60a7996e0
>>>> [ 743.186889] R13: 00007ffc81816a90 R14: 00007ffc81816a90 R15:
>>>> 000055d60a782a30
>>>> [ 743.186892] </TASK>
>>>> [ 743.186893] Modules linked in: rfcomm snd_hrtimer xt_CHECKSUM
>>>> xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
>>>> xt_addrtype nft_compat x_tables nft_chain_nat nf_nat nf_conntrack
>>>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep
>>>> mtd_intel_dg snd_hda_codec_intelhdmi mtd snd_hda_codec_hdmi
>>>> nls_utf8
>>>> mxm_wmi intel_wmi_thunderbolt gigabyte_wmi wmi_bmof xe drm_gpuvm
>>>> drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video
>>>> drm_suballoc_helper gpu_sched drm_client_lib drm_exec
>>>> drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit
>>>> x86_pkg_temp_thermal intel_powerclamp coretemp
>>>> snd_hda_codec_alc882
>>>> snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel
>>>> snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec
>>>> snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress
>>>> ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801
>>>> i2c_mux
>>>> snd i2c_smbus btusb btrtl btbcm btmtk btintel bluetooth
>>>> ecdh_generic
>>>> rfkill ecc mei_me mei ioatdma dca wmi nfsd drm i2c_dev fuse
>>>> nfnetlink
>>>> [ 743.186938] ---[ end trace 0000000000000000 ]---
>>>>
>>>> And for property blobs:
>>>>
>>>> void drm_mode_config_cleanup(struct drm_device *dev)
>>>> {
>>>> ...
>>>> list_for_each_entry_safe(blob, bt, &dev-
>>>>> mode_config.property_blob_list,
>>>> head_global) {
>>>> drm_property_blob_put(blob);
>>>> }
>>>>
>>>> Resulting in:
>>>>
>>>> [ 371.072940] BUG: unable to handle page fault for address:
>>>> 000001ffffffffff
>>>> [ 371.072944] #PF: supervisor read access in kernel mode
>>>> [ 371.072945] #PF: error_code(0x0000) - not-present page
>>>> [ 371.072947] PGD 0 P4D 0
>>>> [ 371.072950] Oops: Oops: 0000 [#1] SMP
>>>> [ 371.072953] CPU: 0 UID: 1000 PID: 3693 Comm: kwin_wayland Not
>>>> tainted 7.0.0-rc1-valkyria+ #111 PREEMPT_{RT,(lazy)}
>>>> [ 371.072956] Hardware name: Gigabyte Technology Co., Ltd. X299
>>>> AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021
>>>> [ 371.072957] RIP:
>>>> 0010:drm_property_destroy_user_blobs+0x3b/0x90
>>>> [drm]
>>>> [ 371.073019] Code: 00 00 48 83 ec 10 48 8b 86 30 01 00 00 48 39
>>>> c3
>>>> 74 59 48 89 c2 48 8d 48 c8 48 8b 00 4c 8d 60 c8 eb 04 4c 8d 60 c8
>>>> 48
>>>> 8b 71 40 <48> 39 16 0f 85 39 32 01 00 48 3b 50 08 0f 85 2f 32 01
>>>> 00
>>>> 48 89 70
>>>> [ 371.073021] RSP: 0018:ffffc90006a73de8 EFLAGS: 00010293
>>>> [ 371.073022] RAX: 000001ffffffffff RBX: ffff888118a1a930 RCX:
>>>> ffff8881b92355c0
>>>> [ 371.073024] RDX: ffff8881b92355f8 RSI: 000001ffffffffff RDI:
>>>> ffff888118be4000
>>>> [ 371.073025] RBP: ffffc90006a73e08 R08: ffff8881009b7300 R09:
>>>> ffff888cecc5b000
>>>> [ 371.073026] R10: ffffc90006a73e90 R11: 0000000000000002 R12:
>>>> 000001ffffffffc7
>>>> [ 371.073027] R13: ffff888118a1a980 R14: ffff88810b366d20 R15:
>>>> ffff888118a1a970
>>>> [ 371.073028] FS: 00007f1faccbb280(0000)
>>>> GS:ffff888cec2db000(0000)
>>>> knlGS:0000000000000000
>>>> [ 371.073029] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 371.073030] CR2: 000001ffffffffff CR3: 000000010655c001 CR4:
>>>> 00000000003706f0
>>>> [ 371.073031] Call Trace:
>>>> [ 371.073033] <TASK>
>>>> [ 371.073036] drm_file_free+0x1df/0x2a0 [drm]
>>>> [ 371.073077] drm_release_noglobal+0x7a/0xe0 [drm]
>>>> [ 371.073113] __fput+0xe2/0x2b0
>>>> [ 371.073118] fput_close_sync+0x40/0xa0
>>>> [ 371.073119] __x64_sys_close+0x3e/0x80
>>>> [ 371.073122] x64_sys_call+0xa07/0xa30
>>>> [ 371.073126] do_syscall_64+0xc0/0x840
>>>> [ 371.073130] entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>> [ 371.073132] RIP: 0033:0x7f1fb3501732
>>>> [ 371.073133] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89
>>>> d7
>>>> 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c
>>>> 24
>>>> 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa
>>>> 55
>>>> bf 01 00
>>>> [ 371.073135] RSP: 002b:00007ffe8e6f0278 EFLAGS: 00000246
>>>> ORIG_RAX:
>>>> 0000000000000003
>>>> [ 371.073136] RAX: ffffffffffffffda RBX: 00007ffe8e6f0300 RCX:
>>>> 00007f1fb3501732
>>>> [ 371.073137] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>>> 0000000000000012
>>>> [ 371.073138] RBP: 00007ffe8e6f02a0 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [ 371.073139] R10: 0000000000000000 R11: 0000000000000246 R12:
>>>> 00005585ba46eea0
>>>> [ 371.073140] R13: 00007ffe8e6f0360 R14: 00007ffe8e6f0360 R15:
>>>> 00005585ba458a30
>>>> [ 371.073143] </TASK>
>>>> [ 371.073144] Modules linked in: rfcomm snd_hrtimer xt_addrtype
>>>> xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4
>>>> xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat nf_conntrack
>>>> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep
>>>> snd_hda_codec_intelhdmi snd_hda_codec_hdmi mtd_intel_dg mtd
>>>> nls_utf8
>>>> wmi_bmof mxm_wmi gigabyte_wmi intel_wmi_thunderbolt xe drm_gpuvm
>>>> drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video
>>>> drm_suballoc_helper gpu_sched drm_client_lib drm_exec
>>>> drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit
>>>> x86_pkg_temp_thermal intel_powerclamp coretemp
>>>> snd_hda_codec_alc882
>>>> snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel
>>>> snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec
>>>> snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress
>>>> ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801 btusb
>>>> i2c_mux i2c_smbus btrtl snd btbcm btmtk btintel bluetooth
>>>> ecdh_generic rfkill ecc mei_me mei ioatdma dca wmi nfsd drm
>>>> i2c_dev
>>>> fuse nfnetlink
>>>> [ 371.073198] CR2: 000001ffffffffff
>>>> [ 371.073199] ---[ end trace 0000000000000000 ]---
>>>>
>>>> Add a guard around file close, and ensure the warnings from
>>>> drm_mode_config
>>>> do not trigger. Fix those by allowing an open reference to the
>>>> file
>>>> descriptor
>>>> and cleaning up the file linked list entry in
>>>> drm_mode_config_cleanup().
>>>>
>>>> Cc: Thomas Hellström <[email protected]>
>>>> Signed-off-by: Maarten Lankhorst <[email protected]>
>>>> ---
>>>> drivers/gpu/drm/drm_file.c | 5 ++++-
>>>> drivers/gpu/drm/drm_mode_config.c | 9 ++++++---
>>>> 2 files changed, 10 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_file.c
>>>> b/drivers/gpu/drm/drm_file.c
>>>> index ec820686b3021..f52141f842a1f 100644
>>>> --- a/drivers/gpu/drm/drm_file.c
>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>> @@ -233,6 +233,7 @@ static void drm_events_release(struct
>>>> drm_file
>>>> *file_priv)
>>>> void drm_file_free(struct drm_file *file)
>>>> {
>>>> struct drm_device *dev;
>>>> + int idx;
>>>>
>>>> if (!file)
>>>> return;
>>>> @@ -249,9 +250,11 @@ void drm_file_free(struct drm_file *file)
>>>>
>>>> drm_events_release(file);
>>>>
>>>> - if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>>>> + if (drm_core_check_feature(dev, DRIVER_MODESET) &&
>>>> + drm_dev_enter(dev, &idx)) {
>>>> drm_fb_release(file);
>>>> drm_property_destroy_user_blobs(dev, file);
>>>> + drm_dev_exit(idx);
>>>> }
>>>>
>>>> if (drm_core_check_feature(dev, DRIVER_SYNCOBJ))
>>>> diff --git a/drivers/gpu/drm/drm_mode_config.c
>>>> b/drivers/gpu/drm/drm_mode_config.c
>>>> index 84ae8a23a3678..e349418978f79 100644
>>>> --- a/drivers/gpu/drm/drm_mode_config.c
>>>> +++ b/drivers/gpu/drm/drm_mode_config.c
>>>> @@ -583,10 +583,13 @@ void drm_mode_config_cleanup(struct
>>>> drm_device
>>>> *dev)
>>>> */
>>>> WARN_ON(!list_empty(&dev->mode_config.fb_list));
>>>> list_for_each_entry_safe(fb, fbt, &dev-
>>>>> mode_config.fb_list,
>>>> head) {
>>>> - struct drm_printer p = drm_dbg_printer(dev,
>>>> DRM_UT_KMS, "[leaked fb]");
>>>> + if (list_empty(&fb->filp_head) ||
>>>> drm_framebuffer_read_refcount(fb) > 1) {
>>>
>>> This looks a bit scary. Can someone manipulate the fb_list and even
>>> free fbs while we are iterating? Or is all other manipulation
>>> blocked
>>> by the device being unplugged?
>> The code already frees the framebuffer here as there is nothing
>> running
>> that can still reference it.
>>
>> The framebuffers are no longer used as everything display is already
>> torn down,
>> and the device unplugged. That's what the drm_dev_enter/exit in
>> drm_file.c are
>> there to protect.
>
> OK, great.
>
> Reviewed-by: Thomas Hellström <[email protected]>
Since it covers existing problems I pushed it to fixes with some extra tags,
added below.
Cc: <[email protected]> # v4.18+
Fixes: bee330f3d672 ("drm: Use srcu to protect drm_device.unplugged")