On Mon, 31 Oct 2016, Oleg Nesterov wrote: > I think we need to unexport kthread_park/unpark, and either make it return > "void" or actually fix the race with kthred_stop/exit. This patch just adds > WARN_ON(PF_EXITING) for now.
I'll have a look. > The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c) > is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads() > clears *ht->store, smpboot_park_thread() checks it is not NULL under the same > smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other > callers are fine too. > > But it has two more users: > > - watchdog_park_threads() and it does not look nice. The code is actually > correct, get_online_cpus() ensures that kthread_park() can't race with > itself (note that kthread_park() can't handle this race correctly), but > imo it should not use kthread_park() directly. Should we provide an interface through the smpboot thread infrastructure for this? > - drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and I think it should not > use kthread_park() too. > > But this patch should not break this code. kthread_park() must not be > called after amd_sched_fini() which does kthread_stop(), otherwise even > to_live_kthread() is not safe because task_struct can be already freed > and sched->thread can point to nowhere. Right. That's why the smpboot thread code holds a task ref which is only released after the thread has been stopped. I can see why that gpu driver wants to use the park mechanism and I guess there are other legitimate use cases as well. I prefer to implement a park/unpark variant which is safe to use on arbitrary kthreads over forcing driver writers to come up with even more broken open coded implementations of that. > Signed-off-by: Oleg Nesterov <o...@redhat.com> Reviewed-by: Thomas Gleixner <t...@linutronix.de>