On Mon, Jul 22, 2024 at 5:38 PM Mathias Krause <mini...@grsecurity.net> wrote: > > On 22.07.24 13:13, Ajay Kaher wrote: > > On Sat, Jul 20, 2024 at 2:17 AM Mathias Krause <mini...@grsecurity.net> > > wrote: > >> > >> I noticed, the user events ftrace selftest is crashing every now and > >> then in our automated tests. Digging into, I found that the following > >> is triggering the issue very reliable: > >> > >> - in one shell, as root: > >> # while true; do ./kselftest/user_events/ftrace_test; done > >> > >> - in a second shell, again as root: > >> # cd /sys/kernel/tracing > >> # while true; do cat events/user_events/__test_event/format; done > >> 2>/dev/null > >> > > > > Tried to reproduced on 6.10.0-rc7-100.ph5+, only getting repeated output as: >
Mathias, thanks for reporting. I am able to reproduce the 'KASAN: slab-use-after-free'. Steve, let me know if anything wrong in my investigation: [ 6264.339882] ================================================================== [ 6264.339970] BUG: KASAN: slab-use-after-free in f_start+0x2b5/0x370 This belongs to f_start() -> f_next() -> trace_get_fields(): trace_get_fields(struct trace_event_call *event_call) { if (!event_call->class->get_fields) return &event_call->class->fields; return event_call->class->get_fields(event_call); } This happens while reading 'events/user_events/__test_event/format'. Allocation: [ 6264.347212] Allocated by task 3287: [ 6264.348247] kasan_save_stack+0x26/0x50 [ 6264.348256] kasan_save_track+0x14/0x40 [ 6264.348260] kasan_save_alloc_info+0x37/0x50 [ 6264.348265] __kasan_kmalloc+0xb3/0xc0 [ 6264.348268] kmalloc_trace_noprof+0x168/0x330 [ 6264.348280] user_event_parse_cmd+0x57b/0x26c0 [ 6264.348286] user_events_ioctl+0xa92/0x1850 [ 6264.348290] __x64_sys_ioctl+0x138/0x1b0 [ 6264.348295] x64_sys_call+0x9a4/0x1f20 [ 6264.348299] do_syscall_64+0x4b/0x110 user_event_parse_cmd() -> user_event_parse() { . user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT); Link: https://elixir.bootlin.com/linux/v6.10/source/kernel/trace/trace_events_user.c#L2118 Freed: [ 6264.350333] kfree+0xd1/0x2b0 [ 6264.350337] destroy_user_event.part.0+0x313/0x450 [ 6264.350341] destroy_user_event+0x129/0x1a0 [ 6264.350344] delayed_destroy_user_event+0x62/0xd0 [ 6264.350347] process_one_work+0x621/0xf60 [ 6264.350359] worker_thread+0x760/0x14f0 static int destroy_user_event(struct user_event *user) { . kfree(user->call.print_fmt); kfree(EVENT_NAME(user)); kfree(user); <-- Link: https://elixir.bootlin.com/linux/v6.10/source/kernel/trace/trace_events_user.c#L1510 Race condition: Thread A i.e. event reader able to reach the f_start() as the path is valid. Thread A waiting for lock. At the sametime, Thread B has acquired lock and removing events entry followed by free the user_event object. Later once Thread A got the lock it tried to read address which belongs to struct trace_event_call (struct trace_event_call is member of struct user_event) Thread A (read event) Thread B (remove event) . worker_thread() . delayed_destroy_user_event() -> acquire event_mutex . destroy_user_event() vfs_read() . seq_read() . f_start() -> acquire event_mutex eventfs_remove_dir() . (waiting) kfree(user) . (waiting) released event_mutex acquired event_mutex f_next() trace_get_fields(): I think you have added the following check in f_start() to prevent this race condition, but somehow with eventfs still some gap to race condition. static void *f_start(struct seq_file *m, loff_t *pos) { mutex_lock(&event_mutex); if (!event_file_data(m->private)) <-- return ERR_PTR(-ENODEV); -Ajay