On Tue, 2026-03-17 at 16:39 +0100, Maarten Lankhorst wrote:
>
>
> Den 2026-03-17 kl. 16:26, skrev Thomas Hellström:
> > On Fri, 2026-03-13 at 16:17 +0100, Maarten Lankhorst wrote:
> > > When trying to do a rather aggressive test of igt's
> > > "xe_module_load
> > > --r reload" with a full desktop environment and game running I
> > > noticed
> > > a few OOPSes when dereferencing freed pointers, related to
> > > framebuffers and property blobs after the compositor exits.
> > >
> > > Solve this by guarding the freeing in drm_file with
> > > drm_dev_enter/exit,
> > > and immediately put the references from struct drm_file objects
> > > during
> > > drm_dev_unplug().
> > >
> > > Related warnings for framebuffers on the subtest:
> > > [ 739.713076] ------------[ cut here ]------------
> > > WARN_ON(!list_empty(&dev->mode_config.fb_list))
> > > [ 739.713079] WARNING: drivers/gpu/drm/drm_mode_config.c:584 at
> > > drm_mode_config_cleanup+0x30b/0x320 [drm], CPU#12:
> > > xe_module_load/13145
> > > ....
> > > [ 739.713328] Call Trace:
> > > [ 739.713330] <TASK>
> > > [ 739.713335] ? intel_pmdemand_destroy_state+0x11/0x20 [xe]
> > > [ 739.713574] ? intel_atomic_global_obj_cleanup+0xe4/0x1a0 [xe]
> > > [ 739.713794] intel_display_driver_remove_noirq+0x51/0xb0 [xe]
> > > [ 739.714041] xe_display_fini_early+0x33/0x50 [xe]
> > > [ 739.714284] devm_action_release+0xf/0x20
> > > [ 739.714294] devres_release_all+0xad/0xf0
> > > [ 739.714301] device_unbind_cleanup+0x12/0xa0
> > > [ 739.714305] device_release_driver_internal+0x1b7/0x210
> > > [ 739.714311] device_driver_detach+0x14/0x20
> > > [ 739.714315] unbind_store+0xa6/0xb0
> > > [ 739.714319] drv_attr_store+0x21/0x30
> > > [ 739.714322] sysfs_kf_write+0x48/0x60
> > > [ 739.714328] kernfs_fop_write_iter+0x16b/0x240
> > > [ 739.714333] vfs_write+0x266/0x520
> > > [ 739.714341] ksys_write+0x72/0xe0
> > > [ 739.714345] __x64_sys_write+0x19/0x20
> > > [ 739.714347] x64_sys_call+0xa15/0xa30
> > > [ 739.714355] do_syscall_64+0xd8/0xab0
> > > [ 739.714361] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > >
> > > and
> > >
> > > [ 739.714459] ------------[ cut here ]------------
> > > [ 739.714461] xe 0000:67:00.0: [drm]
> > > drm_WARN_ON(!list_empty(&fb-
> > > > filp_head))
> > > [ 739.714464] WARNING: drivers/gpu/drm/drm_framebuffer.c:833 at
> > > drm_framebuffer_free+0x6c/0x90 [drm], CPU#12:
> > > xe_module_load/13145
> > > [ 739.714715] RIP: 0010:drm_framebuffer_free+0x7a/0x90 [drm]
> > > ...
> > > [ 739.714869] Call Trace:
> > > [ 739.714871] <TASK>
> > > [ 739.714876] drm_mode_config_cleanup+0x26a/0x320 [drm]
> > > [ 739.714998] ? __drm_printfn_seq_file+0x20/0x20 [drm]
> > > [ 739.715115] ? drm_mode_config_cleanup+0x207/0x320 [drm]
> > > [ 739.715235] intel_display_driver_remove_noirq+0x51/0xb0 [xe]
> > > [ 739.715576] xe_display_fini_early+0x33/0x50 [xe]
> > > [ 739.715821] devm_action_release+0xf/0x20
> > > [ 739.715828] devres_release_all+0xad/0xf0
> > > [ 739.715843] device_unbind_cleanup+0x12/0xa0
> > > [ 739.715850] device_release_driver_internal+0x1b7/0x210
> > > [ 739.715856] device_driver_detach+0x14/0x20
> > > [ 739.715860] unbind_store+0xa6/0xb0
> > > [ 739.715865] drv_attr_store+0x21/0x30
> > > [ 739.715868] sysfs_kf_write+0x48/0x60
> > > [ 739.715873] kernfs_fop_write_iter+0x16b/0x240
> > > [ 739.715878] vfs_write+0x266/0x520
> > > [ 739.715886] ksys_write+0x72/0xe0
> > > [ 739.715890] __x64_sys_write+0x19/0x20
> > > [ 739.715893] x64_sys_call+0xa15/0xa30
> > > [ 739.715900] do_syscall_64+0xd8/0xab0
> > > [ 739.715905] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > >
> > > and then finally file close blows up:
> > >
> > > [ 743.186530] Oops: general protection fault, probably for non-
> > > canonical address 0xdead000000000122: 0000 [#1] SMP
> > > [ 743.186535] CPU: 3 UID: 1000 PID: 3453 Comm: kwin_wayland
> > > Tainted:
> > > G W 7.0.0-rc1-valkyria+ #110 PREEMPT_{RT,(lazy)}
> > > [ 743.186537] Tainted: [W]=WARN
> > > [ 743.186538] Hardware name: Gigabyte Technology Co., Ltd. X299
> > > AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021
> > > [ 743.186539] RIP: 0010:drm_framebuffer_cleanup+0x55/0xc0 [drm]
> > > [ 743.186588] Code: d8 72 73 0f b6 42 05 ff c3 39 c3 72 e8 49 8d
> > > bd
> > > 50 07 00 00 31 f6 e8 3a 80 d3 e1 49 8b 44 24 10 49 8d 7c 24 08 49
> > > 8b
> > > 54 24 08 <48> 3b 38 0f 85 95 7f 02 00 48 3b 7a 08 0f 85 8b 7f 02
> > > 00
> > > 48 89 42
> > > [ 743.186589] RSP: 0018:ffffc900085e3cf8 EFLAGS: 00010202
> > > [ 743.186591] RAX: dead000000000122 RBX: 0000000000000001 RCX:
> > > ffffffff8217ed03
> > > [ 743.186592] RDX: dead000000000100 RSI: 0000000000000000 RDI:
> > > ffff88814675ba08
> > > [ 743.186593] RBP: ffffc900085e3d10 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [ 743.186593] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffff88814675ba00
> > > [ 743.186594] R13: ffff88810d778000 R14: ffff888119f6dca0 R15:
> > > ffff88810c660bb0
> > > [ 743.186595] FS: 00007ff377d21280(0000)
> > > GS:ffff888cec3f8000(0000)
> > > knlGS:0000000000000000
> > > [ 743.186596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 743.186596] CR2: 000055690b55e000 CR3: 0000000113586003 CR4:
> > > 00000000003706f0
> > > [ 743.186597] Call Trace:
> > > [ 743.186598] <TASK>
> > > [ 743.186603] intel_user_framebuffer_destroy+0x12/0x90 [xe]
> > > [ 743.186722] drm_framebuffer_free+0x3a/0x90 [drm]
> > > [ 743.186750] ? trace_hardirqs_on+0x5f/0x120
> > > [ 743.186754] drm_mode_object_put+0x51/0x70 [drm]
> > > [ 743.186786] drm_fb_release+0x105/0x190 [drm]
> > > [ 743.186812] ? rt_mutex_slowunlock+0x3aa/0x410
> > > [ 743.186817] ? rt_spin_lock+0xea/0x1b0
> > > [ 743.186819] drm_file_free+0x1e0/0x2c0 [drm]
> > > [ 743.186843] drm_release_noglobal+0x91/0xf0 [drm]
> > > [ 743.186865] __fput+0x100/0x2e0
> > > [ 743.186869] fput_close_sync+0x40/0xa0
> > > [ 743.186870] __x64_sys_close+0x3e/0x80
> > > [ 743.186873] x64_sys_call+0xa07/0xa30
> > > [ 743.186879] do_syscall_64+0xd8/0xab0
> > > [ 743.186881] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > [ 743.186882] RIP: 0033:0x7ff37e567732
> > > [ 743.186884] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89
> > > d7
> > > 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c
> > > 24
> > > 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa
> > > 55
> > > bf 01 00
> > > [ 743.186885] RSP: 002b:00007ffc818169a8 EFLAGS: 00000246
> > > ORIG_RAX:
> > > 0000000000000003
> > > [ 743.186886] RAX: ffffffffffffffda RBX: 00007ffc81816a30 RCX:
> > > 00007ff37e567732
> > > [ 743.186887] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > 0000000000000012
> > > [ 743.186888] RBP: 00007ffc818169d0 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [ 743.186889] R10: 0000000000000000 R11: 0000000000000246 R12:
> > > 000055d60a7996e0
> > > [ 743.186889] R13: 00007ffc81816a90 R14: 00007ffc81816a90 R15:
> > > 000055d60a782a30
> > > [ 743.186892] </TASK>
> > > [ 743.186893] Modules linked in: rfcomm snd_hrtimer xt_CHECKSUM
> > > xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
> > > xt_addrtype nft_compat x_tables nft_chain_nat nf_nat nf_conntrack
> > > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep
> > > mtd_intel_dg snd_hda_codec_intelhdmi mtd snd_hda_codec_hdmi
> > > nls_utf8
> > > mxm_wmi intel_wmi_thunderbolt gigabyte_wmi wmi_bmof xe drm_gpuvm
> > > drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video
> > > drm_suballoc_helper gpu_sched drm_client_lib drm_exec
> > > drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit
> > > x86_pkg_temp_thermal intel_powerclamp coretemp
> > > snd_hda_codec_alc882
> > > snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel
> > > snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec
> > > snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress
> > > ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801
> > > i2c_mux
> > > snd i2c_smbus btusb btrtl btbcm btmtk btintel bluetooth
> > > ecdh_generic
> > > rfkill ecc mei_me mei ioatdma dca wmi nfsd drm i2c_dev fuse
> > > nfnetlink
> > > [ 743.186938] ---[ end trace 0000000000000000 ]---
> > >
> > > And for property blobs:
> > >
> > > void drm_mode_config_cleanup(struct drm_device *dev)
> > > {
> > > ...
> > > list_for_each_entry_safe(blob, bt, &dev-
> > > > mode_config.property_blob_list,
> > > head_global) {
> > > drm_property_blob_put(blob);
> > > }
> > >
> > > Resulting in:
> > >
> > > [ 371.072940] BUG: unable to handle page fault for address:
> > > 000001ffffffffff
> > > [ 371.072944] #PF: supervisor read access in kernel mode
> > > [ 371.072945] #PF: error_code(0x0000) - not-present page
> > > [ 371.072947] PGD 0 P4D 0
> > > [ 371.072950] Oops: Oops: 0000 [#1] SMP
> > > [ 371.072953] CPU: 0 UID: 1000 PID: 3693 Comm: kwin_wayland Not
> > > tainted 7.0.0-rc1-valkyria+ #111 PREEMPT_{RT,(lazy)}
> > > [ 371.072956] Hardware name: Gigabyte Technology Co., Ltd. X299
> > > AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021
> > > [ 371.072957] RIP:
> > > 0010:drm_property_destroy_user_blobs+0x3b/0x90
> > > [drm]
> > > [ 371.073019] Code: 00 00 48 83 ec 10 48 8b 86 30 01 00 00 48 39
> > > c3
> > > 74 59 48 89 c2 48 8d 48 c8 48 8b 00 4c 8d 60 c8 eb 04 4c 8d 60 c8
> > > 48
> > > 8b 71 40 <48> 39 16 0f 85 39 32 01 00 48 3b 50 08 0f 85 2f 32 01
> > > 00
> > > 48 89 70
> > > [ 371.073021] RSP: 0018:ffffc90006a73de8 EFLAGS: 00010293
> > > [ 371.073022] RAX: 000001ffffffffff RBX: ffff888118a1a930 RCX:
> > > ffff8881b92355c0
> > > [ 371.073024] RDX: ffff8881b92355f8 RSI: 000001ffffffffff RDI:
> > > ffff888118be4000
> > > [ 371.073025] RBP: ffffc90006a73e08 R08: ffff8881009b7300 R09:
> > > ffff888cecc5b000
> > > [ 371.073026] R10: ffffc90006a73e90 R11: 0000000000000002 R12:
> > > 000001ffffffffc7
> > > [ 371.073027] R13: ffff888118a1a980 R14: ffff88810b366d20 R15:
> > > ffff888118a1a970
> > > [ 371.073028] FS: 00007f1faccbb280(0000)
> > > GS:ffff888cec2db000(0000)
> > > knlGS:0000000000000000
> > > [ 371.073029] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 371.073030] CR2: 000001ffffffffff CR3: 000000010655c001 CR4:
> > > 00000000003706f0
> > > [ 371.073031] Call Trace:
> > > [ 371.073033] <TASK>
> > > [ 371.073036] drm_file_free+0x1df/0x2a0 [drm]
> > > [ 371.073077] drm_release_noglobal+0x7a/0xe0 [drm]
> > > [ 371.073113] __fput+0xe2/0x2b0
> > > [ 371.073118] fput_close_sync+0x40/0xa0
> > > [ 371.073119] __x64_sys_close+0x3e/0x80
> > > [ 371.073122] x64_sys_call+0xa07/0xa30
> > > [ 371.073126] do_syscall_64+0xc0/0x840
> > > [ 371.073130] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > [ 371.073132] RIP: 0033:0x7f1fb3501732
> > > [ 371.073133] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89
> > > d7
> > > 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c
> > > 24
> > > 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa
> > > 55
> > > bf 01 00
> > > [ 371.073135] RSP: 002b:00007ffe8e6f0278 EFLAGS: 00000246
> > > ORIG_RAX:
> > > 0000000000000003
> > > [ 371.073136] RAX: ffffffffffffffda RBX: 00007ffe8e6f0300 RCX:
> > > 00007f1fb3501732
> > > [ 371.073137] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > 0000000000000012
> > > [ 371.073138] RBP: 00007ffe8e6f02a0 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [ 371.073139] R10: 0000000000000000 R11: 0000000000000246 R12:
> > > 00005585ba46eea0
> > > [ 371.073140] R13: 00007ffe8e6f0360 R14: 00007ffe8e6f0360 R15:
> > > 00005585ba458a30
> > > [ 371.073143] </TASK>
> > > [ 371.073144] Modules linked in: rfcomm snd_hrtimer xt_addrtype
> > > xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4
> > > xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat nf_conntrack
> > > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep
> > > snd_hda_codec_intelhdmi snd_hda_codec_hdmi mtd_intel_dg mtd
> > > nls_utf8
> > > wmi_bmof mxm_wmi gigabyte_wmi intel_wmi_thunderbolt xe drm_gpuvm
> > > drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video
> > > drm_suballoc_helper gpu_sched drm_client_lib drm_exec
> > > drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit
> > > x86_pkg_temp_thermal intel_powerclamp coretemp
> > > snd_hda_codec_alc882
> > > snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel
> > > snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec
> > > snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress
> > > ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801 btusb
> > > i2c_mux i2c_smbus btrtl snd btbcm btmtk btintel bluetooth
> > > ecdh_generic rfkill ecc mei_me mei ioatdma dca wmi nfsd drm
> > > i2c_dev
> > > fuse nfnetlink
> > > [ 371.073198] CR2: 000001ffffffffff
> > > [ 371.073199] ---[ end trace 0000000000000000 ]---
> > >
> > > Add a guard around file close, and ensure the warnings from
> > > drm_mode_config
> > > do not trigger. Fix those by allowing an open reference to the
> > > file
> > > descriptor
> > > and cleaning up the file linked list entry in
> > > drm_mode_config_cleanup().
> > >
> > > Cc: Thomas Hellström <[email protected]>
> > > Signed-off-by: Maarten Lankhorst <[email protected]>
> > > ---
> > > drivers/gpu/drm/drm_file.c | 5 ++++-
> > > drivers/gpu/drm/drm_mode_config.c | 9 ++++++---
> > > 2 files changed, 10 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_file.c
> > > b/drivers/gpu/drm/drm_file.c
> > > index ec820686b3021..f52141f842a1f 100644
> > > --- a/drivers/gpu/drm/drm_file.c
> > > +++ b/drivers/gpu/drm/drm_file.c
> > > @@ -233,6 +233,7 @@ static void drm_events_release(struct
> > > drm_file
> > > *file_priv)
> > > void drm_file_free(struct drm_file *file)
> > > {
> > > struct drm_device *dev;
> > > + int idx;
> > >
> > > if (!file)
> > > return;
> > > @@ -249,9 +250,11 @@ void drm_file_free(struct drm_file *file)
> > >
> > > drm_events_release(file);
> > >
> > > - if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> > > + if (drm_core_check_feature(dev, DRIVER_MODESET) &&
> > > + drm_dev_enter(dev, &idx)) {
> > > drm_fb_release(file);
> > > drm_property_destroy_user_blobs(dev, file);
> > > + drm_dev_exit(idx);
> > > }
> > >
> > > if (drm_core_check_feature(dev, DRIVER_SYNCOBJ))
> > > diff --git a/drivers/gpu/drm/drm_mode_config.c
> > > b/drivers/gpu/drm/drm_mode_config.c
> > > index 84ae8a23a3678..e349418978f79 100644
> > > --- a/drivers/gpu/drm/drm_mode_config.c
> > > +++ b/drivers/gpu/drm/drm_mode_config.c
> > > @@ -583,10 +583,13 @@ void drm_mode_config_cleanup(struct
> > > drm_device
> > > *dev)
> > > */
> > > WARN_ON(!list_empty(&dev->mode_config.fb_list));
> > > list_for_each_entry_safe(fb, fbt, &dev-
> > > >mode_config.fb_list,
> > > head) {
> > > - struct drm_printer p = drm_dbg_printer(dev,
> > > DRM_UT_KMS, "[leaked fb]");
> > > + if (list_empty(&fb->filp_head) ||
> > > drm_framebuffer_read_refcount(fb) > 1) {
> >
> > This looks a bit scary. Can someone manipulate the fb_list and even
> > free fbs while we are iterating? Or is all other manipulation
> > blocked
> > by the device being unplugged?
> The code already frees the framebuffer here as there is nothing
> running
> that can still reference it.
>
> The framebuffers are no longer used as everything display is already
> torn down,
> and the device unplugged. That's what the drm_dev_enter/exit in
> drm_file.c are
> there to protect.
OK, great.
Reviewed-by: Thomas Hellström <[email protected]>
>
> Kind regards,
> ~Maarten Lankhorst