16.02.2015, 14:52, "Kirill Tkhai" <tk...@yandex.ru>: > 16.02.2015, 10:21, "Fengguang Wu" <fengguang...@intel.com>: >> Hi Peter, >> >> We got the below dmesg and the first bad commit is >> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> revert-d347efeb16d3d5150cb7f8d50b05f388b572840e-1019a359d3dc4b64d0e1e5a5efcb725d5e83994d >> >> commit 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d >> Author: Peter Zijlstra <pet...@infradead.org> >> AuthorDate: Wed Nov 26 08:44:03 2014 +0800 >> Commit: Ingo Molnar <mi...@kernel.org> >> CommitDate: Wed Feb 4 07:52:26 2015 +0100 >> >> sched/deadline: Fix stale yield state >> >> When we fail to start the deadline timer in update_curr_dl(), we >> forget to clear ->dl_yielded, resulting in wrecked time keeping. >> >> Since the natural place to clear both ->dl_yielded and ->dl_throttled >> is in replenish_dl_entity(); both are after all waiting for that event; >> make it so. >> >> Luckily since 67dfa1b756f2 ("sched/deadline: Implement >> cancel_dl_timer() to use in switched_from_dl()") the >> task_on_rq_queued() condition in dl_task_timer() must be true, and can >> therefore call enqueue_task_dl() unconditionally. >> >> Reported-by: Wanpeng Li <wanpeng...@linux.intel.com> >> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> >> Cc: Kirill Tkhai <ktk...@parallels.com> >> Cc: Juri Lelli <juri.le...@arm.com> >> Cc: Linus Torvalds <torva...@linux-foundation.org> >> Link: >> http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng...@linux.intel.com >> Signed-off-by: Ingo Molnar <mi...@kernel.org> >> >> =================================================== >> PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT! >> =================================================== >> >> >> +------------------------------------------------+------------+------------+------------+ >> | | a7bebf4887 | 1019a359d3 >> | 58e0e18cb8 | >> >> +------------------------------------------------+------------+------------+------------+ >> | boot_successes | 886 | 54 >> | 10 | >> | boot_failures | 314 | 246 >> | 4 | >> | BUG:kernel_boot_crashed | 48 | 20 >> | 1 | >> | BUG:kernel_test_crashed | 16 | 2 >> | | >> | BUG:kernel_boot_hang | 213 | 43 >> | | >> | Kernel_panic-not_syncing:softlockup:hung_tasks | 37 | 3 >> | | >> | backtrace:ring_buffer_consumer_thread | 17 | 3 >> | | >> | EIP_is_at_get_node_by_name | 4 | >> | | >> | backtrace:gcov_enable_events | 17 | >> | | >> | backtrace:gcov_fs_init | 17 | >> | | >> | backtrace:kernel_init_freeable | 20 | 178 >> | 3 | >> | EIP_is_at_strcmp | 7 | >> | | >> | EIP_is_at__slab_alloc | 1 | >> | | >> | backtrace:__ring_buffer_alloc | 1 | >> | | >> | backtrace:ring_buffer_benchmark_init | 1 | >> | | >> | EIP_is_at_lock_release | 1 | >> | | >> | backtrace:debug_mutex_init | 1 | >> | | >> | EIP_is_at_strncmp | 1 | >> | | >> | EIP_is_at_do_raw_spin_unlock | 1 | >> | | >> | EIP_is_at__do_softirq | 6 | 2 >> | | >> | EIP_is_at_gcov_info_filename | 1 | >> | | >> | EIP_is_at_ring_buffer_consume | 8 | 1 >> | | >> | EIP_is_at_raw_spin_unlock_irqrestore | 1 | >> | | >> | EIP_is_at_ftrace_ops_list_func | 1 | >> | | >> | backtrace:event_trace_self_tests | 2 | >> | | >> | backtrace:event_trace_self_tests_init | 2 | >> | | >> | EIP_is_at_ftrace_likely_update | 1 | >> | | >> | EIP_is_at_rb_reserve_next_event | 1 | >> | | >> | EIP_is_at_raw_spin_unlock_irq | 1 | >> | | >> | kernel_BUG_at_kernel/sched/deadline.c | 0 | 178 >> | 3 | >> | invalid_opcode | 0 | 178 >> | 3 | >> | EIP_is_at_enqueue_task_dl | 0 | 178 >> | 3 | >> | Kernel_panic-not_syncing:Fatal_exception | 0 | 178 >> | 3 | >> | backtrace:register_tracer | 0 | 178 >> | 3 | >> | backtrace:init_wakeup_tracer | 0 | 178 >> | 3 | >> >> +------------------------------------------------+------------+------------+------------+ >> >> [ 6.432010] Testing ftrace regs: PASSED >> [ 6.530393] Testing tracer wakeup: >> [ 7.648615] ------------[ cut here ]------------ >> [ 7.649000] kernel BUG at kernel/sched/deadline.c:805! >> [ 7.649000] invalid opcode: 0000 [#1] SMP >> [ 7.649000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted >> 3.19.0-rc6-00041-g1019a35 #17 >> [ 7.649000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.7.5-20140531_083030-gandalf 04/01/2014 >> [ 7.649000] task: d3348000 ti: d3344000 task.ti: d3344000 >> [ 7.649000] EIP: 0060:[<c11969a1>] EFLAGS: 00010046 CPU: 0 >> [ 7.649000] EIP is at enqueue_task_dl+0x341/0x1860 >> [ 7.649000] EAX: d2940101 EBX: 00000001 ECX: 00000000 EDX: 0085a491 >> [ 7.649000] ESI: d2940000 EDI: d2940101 EBP: d3345e08 ESP: d3345dc0 >> [ 7.649000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >> [ 7.649000] CR0: 8005003b CR2: ffffffff CR3: 04e8b000 CR4: 000006d0 >> [ 7.649000] Stack: >> [ 7.649000] c1170353 00000002 d3345de0 c11ab53c d391ec10 00000046 >> d33485c0 d3348000 >> [ 7.649000] 00000002 d29401b8 d391ec00 d29401b8 00000046 00000005 >> 00000000 d2940000 >> [ 7.649000] d391ec00 c36efc80 d3345e20 c115991a 00000005 00000000 >> d391ec00 00000000 >> [ 7.649000] Call Trace: >> [ 7.649000] [<c1170353>] ? sched_clock_cpu+0x293/0x2e0 >> [ 7.649000] [<c11ab53c>] ? __lock_is_held+0x6c/0xa0 >> [ 7.649000] [<c115991a>] enqueue_task+0xba/0x220 >> [ 7.649000] [<c115aa4c>] activate_task+0x4c/0x80 >> [ 7.649000] [<c115aaa1>] ttwu_activate+0x21/0x90 >> [ 7.649000] [<c115bd84>] ttwu_do_activate+0x64/0x90 >> [ 7.649000] [<c11689b6>] try_to_wake_up+0x366/0x8b0 >> [ 7.649000] [<c1169000>] wake_up_process+0xa0/0xe0 >> [ 7.649000] [<c114095b>] kthread_stop+0x21b/0xd90 >> [ 7.649000] [<c12c0b27>] trace_selftest_startup_wakeup+0x3b7/0x4b0 >> [ 7.649000] [<c36b1690>] ? wait_for_completion+0x1f0/0x290 >> [ 7.649000] [<c12c1d04>] register_tracer+0x2d4/0x700 >> [ 7.649000] [<c4bd11a4>] init_wakeup_tracer+0x1d/0xea >> [ 7.649000] [<c4b8f12e>] do_one_initcall+0x278/0x44c >> [ 7.649000] [<c4bd1187>] ? init_function_trace+0x1f8/0x1f8 >> [ 7.649000] [<c4b8f565>] kernel_init_freeable+0x263/0x424 >> [ 7.649000] [<c3663b4c>] kernel_init+0x1c/0x2a0 >> [ 7.649000] [<c36c2181>] ret_from_kernel_thread+0x21/0x30 >> [ 7.649000] [<c3663b30>] ? rest_init+0x330/0x330 >> [ 7.649000] Code: da e8 94 96 13 00 8b 04 9d 84 f6 77 c4 83 c0 01 89 04 >> 9d 84 f6 77 c4 89 f8 84 c0 74 4d 83 05 c8 14 f2 c4 01 83 15 cc 14 f2 c4 00 >> <0f> 0b 83 05 d0 14 f2 c4 01 83 15 d4 14 f2 c4 00 8d b4 26 00 00 >> [ 7.649000] EIP: [<c11969a1>] enqueue_task_dl+0x341/0x1860 SS:ESP >> 0068:d3345dc0 >> [ 7.649000] ---[ end trace cb4bdebe33135894 ]--- >> [ 7.649000] Kernel panic - not syncing: Fatal exception >> >> git bisect start 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe >> bfa76d49576599a4b9f9b7a71f23d73d6dcff735 -- >> git bisect bad 30496a920e97b97e16d078c520eef323344ef997 # 23:32 0- >> 67 Merge 'amirv/for-upstream' into devel-xian-smoke-201502152116 >> git bisect good d170f4905ee6f93eb12f87d018dd4b9acea18f2c # 00:27 300+ >> 120 0day base guard for 'devel-xian-smoke-201502152116' >> git bisect bad c5ce28df0e7c01a1de23c36ebdefcd803f2b6cbb # 01:11 0- >> 72 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next >> git bisect good 57ee062e4baa9532e5a3835a2d185590a3fc8bd3 # 01:51 300+ >> 94 Merge branch 'r8152' >> git bisect bad c08f8467939e7d2eebcba7cf2330242c4f53f2f7 # 02:10 0- >> 103 Merge tag 'pci-v3.20-changes' of >> git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci >> git bisect bad 5b9b28a63f2e47dac5ff3a2503bfe3ade8796aa0 # 02:24 0- >> 5 Merge branch 'sched-core-for-linus' of >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip >> git bisect good 30d46827c2744f56bb31460007f2d16455f10720 # 02:41 300+ >> 107 Merge tag 'regulator-v3.20' of >> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator >> git bisect good 8308756f45a12e2ff4f7749c2694fc83cdef0be9 # 03:11 300+ >> 101 Merge branch 'locking-core-for-linus' of >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip >> git bisect good 4ac30cf74b308fb01338e660d3471cd490a7958a # 03:36 300+ >> 25 perf tools: Do not use __perf_session__process_events() directly >> git bisect bad 9659e1eeee28f7025b6545934d644d19e9c6e603 # 03:48 0- >> 2 sched/deadline: Remove cpu_active_mask from cpudl_find() >> git bisect good 3847b272248a3a4ed70d20392cc0454917f7713b # 04:23 300+ >> 89 Merge branch 'sched/urgent' into sched/core >> git bisect good 4c195c8a1967ff8bee13a811518a99db04618ab7 # 04:59 300+ >> 118 Merge branch 'sched/urgent' into sched/core, to merge fixes before >> applying new patches >> git bisect bad 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d # 05:10 39- >> 193 sched/deadline: Fix stale yield state >> git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb # 05:34 300+ >> 103 sched/deadline: Fix hrtick for a non-leftmost task >> # first bad commit: [1019a359d3dc4b64d0e1e5a5efcb725d5e83994d] >> sched/deadline: Fix stale yield state >> git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb # 05:55 900+ >> 314 sched/deadline: Fix hrtick for a non-leftmost task >> # extra tests with DEBUG_INFO >> git bisect good 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d # 06:11 900+ >> 0 sched/deadline: Fix stale yield state >> # extra tests on HEAD of linux-devel/devel-xian-smoke-201502152116 >> git bisect bad 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe # 06:11 0- >> 4 0day head guard for 'devel-xian-smoke-201502152116' >> # extra tests on tree/branch linus/master >> git bisect bad a9724125ad014decf008d782e60447c811391326 # 06:11 0- >> 1 Merge tag 'tty-3.20-rc1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty >> # extra tests with first bad commit reverted >> # extra tests on tree/branch linus/master >> git bisect bad a9724125ad014decf008d782e60447c811391326 # 06:16 0- >> 1 Merge tag 'tty-3.20-rc1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty >> # extra tests on tree/branch next/master >> git bisect bad b8acf73194186a5cba86812eb4ba17b897f0e13e # 06:26 0- >> 35 Add linux-next specific files for 20150213 >> >> This script may reproduce the error. >> >> ---------------------------------------------------------------------------- >> #!/bin/bash >> >> kernel=$1 >> >> kvm=( >> qemu-system-x86_64 >> -cpu kvm64 >> -enable-kvm >> -kernel $kernel >> -m 320 >> -smp 1 >> -net nic,vlan=1,model=e1000 >> -net user,vlan=1 >> -boot order=nc >> -no-reboot >> -watchdog i6300esb >> -rtc base=localtime >> -serial stdio >> -display none >> -monitor null >> ) >> >> append=( >> hung_task_panic=1 >> earlyprintk=ttyS0,115200 >> rd.udev.log-priority=err >> systemd.log_target=journal >> systemd.log_level=warning >> debug >> apic=debug >> sysrq_always_enabled >> rcupdate.rcu_cpu_stall_timeout=100 >> panic=-1 >> softlockup_panic=1 >> nmi_watchdog=panic >> oops=panic >> load_ramdisk=2 >> prompt_ramdisk=0 >> console=ttyS0,115200 >> console=tty0 >> vga=normal >> root=/dev/ram0 >> rw >> drbd.minor_count=8 >> ) >> >> "${kvm[@]}" --append "${append[*]}" >> ---------------------------------------------------------------------------- > > Don't we have the following problem? > > current->state = TASK_INTERRUPTIBLE; > schedule() > deactivate_task() > dequeue_task_dl() > update_curr_dl() > start_dl_timer() > __dequeue_task_dl() > > ...timer fires... > > dl_task_timer() > enqueue_task_dl() /* queues on dl_rq */ > > ...try_to_wake_up() > > enqueue_dl_entity() > BUG_ON(on_dl_rq()) > > Compile-tested fix: > > kernel/sched/deadline.c | 4 ++++ > 1 file changed, 4 insertions(+) > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index a027799..1b78598 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -541,6 +541,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer > *timer) > > sched_clock_tick(); > update_rq_clock(rq); > + if (!p->on_rq) { > + replenish_dl_entity(dl_se, dl_se); > + goto unlock; > + } > enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); > if (dl_task(rq->curr)) > check_preempt_curr_dl(rq, p, 0);
We shouldn't enqueue migrating tasks. Please, try this one instead ;) [PATCH] sched/dl: Prevent enqueue of a sleeping task in dl_task_timer() A deadline task may be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; Later the timer fires, but the task is still dequeued: dl_task_timer() enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */ Someone wakes it up: try_to_wake_up() enqueue_dl_entity() BUG_ON(on_dl_rq()) Patch fixes this problem, it prevents queueing !on_rq tasks on dl_rq. Signed-off-by: Kirill Tkhai <ktk...@parallels.com> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index a027799..784ba68 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -541,6 +541,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) sched_clock_tick(); update_rq_clock(rq); + if (!task_on_rq_queued(p)) { + replenish_dl_entity(dl_se, dl_se); + goto unlock; + } enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); if (dl_task(rq->curr)) check_preempt_curr_dl(rq, p, 0); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/