On Thu, 22 Aug 2024 10:32:02 -0400
Steven Rostedt <rost...@goodmis.org> wrote:

> > Yeah, it seems there might be multiple bugs in the user workload
> > handling, the other NULL pointer dereference and refcount warning
> > above might be related (but I have yet to reproduce it on an upstream
> > kernel). I'm also going to look at the code and will post any findings
> > here.  
> 
> Yes that is the second bug and it is related to the that this addresses.

There's nothing protecting the clearing of the kthreads and calling
put_task_struct(). Here's the fix to the second bug:

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 66a871553d4a..53de719f35cb 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2106,7 +2106,9 @@ static int osnoise_cpu_init(unsigned int cpu)
  */
 static int osnoise_cpu_die(unsigned int cpu)
 {
+       mutex_lock(&interface_lock);
        stop_kthread(cpu);
+       mutex_unlock(&interface_lock);
        return 0;
 }
 
@@ -2239,8 +2241,11 @@ static ssize_t osnoise_options_write(struct file *filp, 
const char __user *ubuf,
         */
        mutex_lock(&trace_types_lock);
        running = osnoise_has_registered_instances();
-       if (running)
+       if (running) {
+               mutex_lock(&interface_lock);
                stop_per_cpu_kthreads();
+               mutex_unlock(&interface_lock);
+       }
 
        mutex_lock(&interface_lock);
        /*
@@ -2355,8 +2360,11 @@ osnoise_cpus_write(struct file *filp, const char __user 
*ubuf, size_t count,
         */
        mutex_lock(&trace_types_lock);
        running = osnoise_has_registered_instances();
-       if (running)
+       if (running) {
+               mutex_lock(&interface_lock);
                stop_per_cpu_kthreads();
+               mutex_unlock(&interface_lock);
+       }
 
        mutex_lock(&interface_lock);
        /*
@@ -2951,7 +2960,9 @@ static void osnoise_workload_stop(void)
         */
        barrier();
 
+       mutex_lock(&interface_lock);
        stop_per_cpu_kthreads();
+       mutex_unlock(&interface_lock);
 
        osnoise_unhook_events();
 }


With both of these fixes, the bug goes away.

I'll add this fix (after enabling lockdep and making sure I didn't screw up
the locking). Can you resend this patch with just not calling cancel if
kthread is NULL. No need to exit out early. I still like to make sure the
clean up happens, and not assume it will already be done.

-- Steve

Reply via email to