If panic_on_oops is not set and oops happens inside workqueue kthread, kernel kills this kthread. Current patch fixes recursive GPF which happens when wq_worker_sleeping() function unconditionally accesses the NULL kthread->vfork_done ptr thru kthread_data() -> to_kthread().
The stack is the following: [<ffffffff81397f75>] dump_stack+0x68/0x93 [<ffffffff8106954b>] ? do_exit+0x7ab/0xc10 [<ffffffff8108fd73>] __schedule_bug+0x83/0xe0 [<ffffffff81716d5a>] __schedule+0x7ea/0xba0 [<ffffffff810c864f>] ? vprintk_default+0x1f/0x30 [<ffffffff8116a63c>] ? printk+0x48/0x50 [<ffffffff81717150>] schedule+0x40/0x90 [<ffffffff8106976a>] do_exit+0x9ca/0xc10 [<ffffffff810c8e3d>] ? kmsg_dump+0x11d/0x190 [<ffffffff810c8d37>] ? kmsg_dump+0x17/0x190 [<ffffffff81021ee9>] oops_end+0x99/0xd0 [<ffffffff81052da5>] no_context+0x185/0x3e0 [<ffffffff81053083>] __bad_area_nosemaphore+0x83/0x1c0 [<ffffffff810c820e>] ? vprintk_emit+0x25e/0x530 [<ffffffff810531d4>] bad_area_nosemaphore+0x14/0x20 [<ffffffff8105355c>] __do_page_fault+0xac/0x570 [<ffffffff810c66fe>] ? console_trylock+0x1e/0xe0 [<ffffffff81002036>] ? trace_hardirqs_off_thunk+0x1a/0x1c [<ffffffff81053a2c>] do_page_fault+0xc/0x10 [<ffffffff8171f812>] page_fault+0x22/0x30 [<ffffffff81089bc3>] ? kthread_data+0x33/0x40 [<ffffffff8108427e>] ? wq_worker_sleeping+0xe/0x80 [<ffffffff817169eb>] __schedule+0x47b/0xba0 [<ffffffff81717150>] schedule+0x40/0x90 [<ffffffff8106957d>] do_exit+0x7dd/0xc10 [<ffffffff81021ee9>] oops_end+0x99/0xd0 kthread->vfork_done is zeroed out on the following path: do_exit() exit_mm() mm_release() complete_vfork_done() In order to fix a bug dead tasks must be ignored. Signed-off-by: Roman Pen <roman.peny...@profitbricks.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Oleg Nesterov <o...@redhat.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Ingo Molnar <mi...@redhat.com> Cc: Tejun Heo <t...@kernel.org> Cc: linux-kernel@vger.kernel.org --- v4: o instead of TASK_DEAD state use more generic PF_EXITING flag. o same dead task check should be also done for wk_worker_waking_up(). With this we try to avoid a case, when we scheduled back to a task, which was just in do_exit and have set the PF_EXITING flag. v3: o minor comment and coding style fixes. v2: o put a task->state check directly into a wq_worker_sleeping() function instead of changing the __schedule(). kernel/workqueue.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 9dc7ac5101e0..23f2d764cebf 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -851,7 +851,17 @@ static void wake_up_worker(struct worker_pool *pool) */ void wq_worker_waking_up(struct task_struct *task, int cpu) { - struct worker *worker = kthread_data(task); + struct worker *worker; + + if (task->flags & PF_EXITING) { + /* + * Careful here, t->vfork_done is zeroed out for + * almost dead tasks, do not touch kthread_data(). + */ + return; + } + + worker = kthread_data(task); if (!(worker->flags & WORKER_NOT_RUNNING)) { WARN_ON_ONCE(worker->pool->cpu != cpu); @@ -875,9 +885,19 @@ void wq_worker_waking_up(struct task_struct *task, int cpu) */ struct task_struct *wq_worker_sleeping(struct task_struct *task) { - struct worker *worker = kthread_data(task), *to_wakeup = NULL; + struct worker *worker, *to_wakeup = NULL; struct worker_pool *pool; + if (task->flags & PF_EXITING) { + /* + * Careful here, t->vfork_done is zeroed out for + * almost dead tasks, do not touch kthread_data(). + */ + return NULL; + } + + worker = kthread_data(task); + /* * Rescuers, which may not have all the fields set up like normal * workers, also reach here, let's not access anything before -- 2.9.3