On Thu, Mar 12, 2015 at 08:32:05AM -0400, Sasha Levin wrote: > On 03/12/2015 08:28 AM, Sasha Levin wrote: > > On 03/11/2015 07:16 PM, Paul E. McKenney wrote: > >> > On Wed, Mar 11, 2015 at 07:06:40PM -0400, Sasha Levin wrote: > >>>> >> > On 03/11/2015 07:01 PM, Paul E. McKenney wrote: > >>>>>>>> >>>> > >> With the commit I didn't hit it yet, but I do see 4 > >>>>>>>> >>>> > >> different WARNings: > >>>>>> >>> > > I wish that I could say that I am surprised, but the sad fact > >>>>>> >>> > > is that > >>>>>> >>> > > I am still shaking the bugs out. > >>>> >> > > >>>> >> > I have one more to add: > >>>> >> > > >>>> >> > [ 93.330539] WARNING: CPU: 1 PID: 8 at > >>>> >> > kernel/rcu/tree_plugin.h:476 rcu_gp_kthread+0x1eaa/0x4dd0() > >> > A bit different, but still in the class of a combining-tree bitmask > >> > handling bug. > > I left it overnight, and am still seeing hangs. Although (and don't catch me > > by that) it seems to be significantly less of them. > > In one of the cases, KASan ended up complaining about something odd going on > in > rcu_check_callbacks():
Hmmm... Looks like I have a synchronization blow between RCU CPU stall warnings and task exit or some such. Promising clue, thank you! Thanx, Paul > [ 7170.779042] BUG: KASan: out of bounds on stack in > sched_show_task+0x324/0x3d0 at addr ffff88075315fdd8 > [ 7170.779042] Read of size 8 by task trinity-c0/15862 > [ 7170.779042] page:ffffea001d4c57c0 count:0 mapcount:0 mapping: > (null) index:0x0 > [ 7170.779042] flags: 0x5afffff80000000() > [ 7170.779042] page dumped because: kasan: bad access detected > [ 7170.779042] CPU: 0 PID: 15862 Comm: trinity-c0 Tainted: G W > 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48-dirty #2052 > [ 7170.779042] ffff88075315fdd8 00000000ac401e1d ffff880050607cd8 > ffffffff9da97e89 > [ 7170.779042] 1ffffd4003a98aff ffff880050607d68 ffff880050607d58 > ffffffff9360eda4 > [ 7170.779042] ffff880050607d98 ffffffff9da8c9df 0000000000000082 > ffffffffa0260429 > [ 7170.779042] Call Trace: > [ 7170.779042] <IRQ> dump_stack (lib/dump_stack.c:52) > [ 7170.779042] kasan_report_error (mm/kasan/report.c:132 > mm/kasan/report.c:193) > [ 7170.779042] ? printk (kernel/printk/printk.c:1854) > [ 7170.779042] ? bitmap_weight (include/linux/bitmap.h:303) > [ 7170.779042] __asan_report_load8_noabort (mm/kasan/report.c:230 > mm/kasan/report.c:251) > [ 7170.779042] ? sched_show_task (kernel/sched/core.c:4526) > [ 7170.779042] sched_show_task (kernel/sched/core.c:4526) > [ 7170.779042] rcu_check_callbacks (kernel/rcu/tree.c:1225 > kernel/rcu/tree.c:1331 kernel/rcu/tree.c:3400 kernel/rcu/tree.c:3464 > kernel/rcu/tree.c:2682) > [ 7170.779042] update_process_times (./arch/x86/include/asm/preempt.h:22 > kernel/time/timer.c:1386) > [ 7170.779042] tick_periodic (kernel/time/tick-common.c:92) > [ 7170.779042] ? tick_handle_periodic (kernel/time/tick-common.c:105) > [ 7170.779042] tick_handle_periodic (kernel/time/tick-common.c:105) > [ 7170.779042] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891) > [ 7170.779042] ? irq_enter (kernel/softirq.c:338) > [ 7170.779042] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:650 > arch/x86/kernel/apic/apic.c:915) > [ 7170.779042] apic_timer_interrupt (arch/x86/kernel/entry_64.S:920) > [ 7170.779042] <EOI> ? do_raw_read_lock (include/asm-generic/qrwlock.h:104 > kernel/locking/spinlock_debug.c:204) > [ 7170.779042] _raw_read_lock (kernel/locking/spinlock.c:224) > [ 7170.779042] ? do_wait (kernel/exit.c:1498 (discriminator 1)) > [ 7170.779042] do_wait (kernel/exit.c:1498 (discriminator 1)) > [ 7170.779042] ? wait_consider_task (kernel/exit.c:1465) > [ 7170.779042] ? find_get_pid (kernel/pid.c:490) > [ 7170.779042] SyS_wait4 (kernel/exit.c:1618 kernel/exit.c:1586) > [ 7170.779042] ? SyS_waitid (kernel/exit.c:1586) > [ 7170.779042] ? kill_orphaned_pgrp (kernel/exit.c:1444) > [ 7170.779042] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592) > [ 7170.779042] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42) > [ 7170.779042] tracesys_phase2 (arch/x86/kernel/entry_64.S:347) > [ 7170.779042] Memory state around the buggy address: > [ 7170.779042] ffff88075315fc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > [ 7170.779042] ffff88075315fd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > [ 7170.779042] >ffff88075315fd80: 00 f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 > f4 f4 > [ 7170.779042] ^ > [ 7170.779042] ffff88075315fe00: f4 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 f1 > [ 7170.779042] ffff88075315fe80: f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 > f4 00 > > > Thanks, > Sasha > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/