On Tue, May 28, 2019 at 01:07:29PM -0700, Thomas Gleixner wrote:
> On Mon, 27 May 2019, Paul E. McKenney wrote:
> 
> > The TASKS03 and TREE04 rcutorture scenarios produce the following
> > lockdep complaint:
> > 
> > ================================
> > WARNING: inconsistent lock state
> > 5.2.0-rc1+ #513 Not tainted
> > --------------------------------
> > inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > migration/1/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > (____ptrval____) (tick_broadcast_lock){?...}, at: 
> > tick_broadcast_offline+0xf/0x70
> > {IN-HARDIRQ-W} state was registered at:
> >   lock_acquire+0xb0/0x1c0
> >   _raw_spin_lock_irqsave+0x3c/0x50
> >   tick_broadcast_switch_to_oneshot+0xd/0x40
> >   tick_switch_to_oneshot+0x4f/0xd0
> >   hrtimer_run_queues+0xf3/0x130
> >   run_local_timers+0x1c/0x50
> >   update_process_times+0x1c/0x50
> >   tick_periodic+0x26/0xc0
> >   tick_handle_periodic+0x1a/0x60
> >   smp_apic_timer_interrupt+0x80/0x2a0
> >   apic_timer_interrupt+0xf/0x20
> >   _raw_spin_unlock_irqrestore+0x4e/0x60
> >   rcu_nocb_gp_kthread+0x15d/0x590
> >   kthread+0xf3/0x130
> >   ret_from_fork+0x3a/0x50
> > irq event stamp: 171
> > hardirqs last  enabled at (171): [<ffffffff8a201a37>] 
> > trace_hardirqs_on_thunk+0x1a/0x1c
> > hardirqs last disabled at (170): [<ffffffff8a201a53>] 
> > trace_hardirqs_off_thunk+0x1a/0x1c
> > softirqs last  enabled at (0): [<ffffffff8a264ee0>] 
> > copy_process.part.56+0x650/0x1cb0
> > softirqs last disabled at (0): [<0000000000000000>] 0x0
> > 
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> > 
> >        CPU0
> >        ----
> >   lock(tick_broadcast_lock);
> >   <Interrupt>
> >     lock(tick_broadcast_lock);
> > 
> >  *** DEADLOCK ***
> > 
> > 1 lock held by migration/1/14:
> >  #0: (____ptrval____) (clockevents_lock){+.+.}, at: 
> > tick_offline_cpu+0xf/0x30
> > 
> > stack backtrace:
> > CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.2.0-rc1+ #513
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
> > Call Trace:
> >  dump_stack+0x5e/0x8b
> >  print_usage_bug+0x1fc/0x216
> >  ? print_shortest_lock_dependencies+0x1b0/0x1b0
> >  mark_lock+0x1f2/0x280
> >  __lock_acquire+0x1e0/0x18f0
> >  ? __lock_acquire+0x21b/0x18f0
> >  ? _raw_spin_unlock_irqrestore+0x4e/0x60
> >  lock_acquire+0xb0/0x1c0
> >  ? tick_broadcast_offline+0xf/0x70
> >  _raw_spin_lock+0x33/0x40
> >  ? tick_broadcast_offline+0xf/0x70
> >  tick_broadcast_offline+0xf/0x70
> >  tick_offline_cpu+0x16/0x30
> >  take_cpu_down+0x7d/0xa0
> >  multi_cpu_stop+0xa2/0xe0
> >  ? cpu_stop_queue_work+0xc0/0xc0
> >  cpu_stopper_thread+0x6d/0x100
> >  smpboot_thread_fn+0x169/0x240
> >  kthread+0xf3/0x130
> >  ? sort_range+0x20/0x20
> >  ? kthread_cancel_delayed_work_sync+0x10/0x10
> >  ret_from_fork+0x3a/0x50
> > 
> > It turns out that tick_broadcast_offline() can be invoked with interrupts
> > enabled, so this commit fixes this issue by replacing the raw_spin_lock()
> > with raw_spin_lock_irqsave().
> 
> What?
> 
> take_cpu_down() is called from multi_cpu_stop() with interrupts disabled.
> 
> So this is just papering over the fact that something called from
> take_cpu_down() enabled interrupts. That needs to be found and fixed.

Just posting the information covered via IRC for posterity.

A bisection located commit a0e928ed7c60
("Merge branch 'timers-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").
Yes, this is a merge commit, but both commits feeding into it are
fine, but the merge fails.  There were no merge conflicts.

It turns out that tick_broadcast_offline() was in innocent bystander.
After all, interrupts are supposed to be disabled throughout
take_cpu_down(), and therefore should have been disabled upon entry to
tick_offline_cpu() and thus to tick_broadcast_offline().

The function returning with irqs enabled was sched_cpu_dying().  It had
irqs enabled after return from sched_tick_stop().  And it had irqs enabled
after return from cancel_delayed_work_sync().  Which is a wrapper around
__cancel_work_timer().  Which can sleep in the case where something else
is concurrently trying to cancel the same delayed work, and sleeping is
a decidedly bad idea when you are invoked from take_cpu_down().

None of these functions have been changed (at all!) in the past year,
so my guess is that some other code was introduced that can race on
__cancel_work_timer().  Except that I am not seeing any other call
to cancel_delayed_work_sync().

Thoughts?

                                                        Thanx, Paul

Reply via email to