On Wed, 17 Jan 2024 21:26:46 -0500
Steven Rostedt <rost...@goodmis.org> wrote:

> On Thu, 18 Jan 2024 02:18:42 +0000
> Chen Zhongjin <chenzhong...@huawei.com> wrote:
> 
> > There is a deadlock scenario in kprobe_optimizer():
> > 
> > pid A                               pid B                   pid C
> > kprobe_optimizer()          do_exit()               perf_kprobe_init()
> > mutex_lock(&kprobe_mutex)   exit_tasks_rcu_start()  
> > mutex_lock(&kprobe_mutex)
> > synchronize_rcu_tasks()             zap_pid_ns_processes()  // waiting 
> > kprobe_mutex
> > // waiting tasks_rcu_exit_srcu      kernel_wait4()
> >                             // waiting pid C exit
> > 
> > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in 
> > kprobe_optimizer()
> > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also 
> > promise
> > that all preempted tasks have scheduled, but it will not wait 
> > tasks_rcu_exit_srcu.
> > 

At first, thanks for finding this scenario! 

> 
> Did lockdep detect this? If not, we should fix that.

Can lockdep find rcu and wait4 related one?

> 
> I'm also thinking if we should find another solution, as this seems more of
> a work around than a fix.

Hmm, IIUC, we may need a synchronizer which will return -EBUSY if
someone starts waiting in exit_tasks_rcu_start(). Then optimizer 
can unlock the mutex and retry it.

Thank you,

> 
> > Fixes: a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe 
> > with CONFIG_PREEMPT=y")
> > Signed-off-by: Chen Zhongjin <chenzhong...@huawei.com>
> > ---
> > v1 -> v2: Add Fixes tag
> > ---
> >  arch/Kconfig     | 2 +-
> >  kernel/kprobes.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index f4b210ab0612..dc6a18854017 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST
> >  config OPTPROBES
> >     def_bool y
> >     depends on KPROBES && HAVE_OPTPROBES
> > -   select TASKS_RCU if PREEMPTION
> > +   select TASKS_RUDE_RCU
> 
> Is this still a bug if PREEMPTION is not enabled?
> 
> -- Steve
> 
> >  
> >  config KPROBES_ON_FTRACE
> >     def_bool y
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index d5a0ee40bf66..09056ae50c58 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work)
> >      * Note that on non-preemptive kernel, this is transparently converted
> >      * to synchronoze_sched() to wait for all interrupts to have completed.
> >      */
> > -   synchronize_rcu_tasks();
> > +   synchronize_rcu_tasks_rude();
> >  
> >     /* Step 3: Optimize kprobes after quiesence period */
> >     do_optimize_kprobes();
> 


-- 
Masami Hiramatsu (Google) <mhira...@kernel.org>

Reply via email to