On Thu, Oct 05, 2017 at 08:01:46AM -0500, Josh Poimboeuf wrote: > On Tue, Oct 03, 2017 at 09:54:31AM -0700, Linus Torvalds wrote: > > On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu <fengguang...@intel.com> wrote: > > > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > > Although its parent commit also has a NULL-dereference bug, however > > > the call stack looks rather different. Both dmesg files are attached. > > > > > > It also triggers this warning, which is being discussed in another > > > thread, so CC Josh. The full dmesg attached, too. > > > > > > Please press Enter to activate this console. > > > [ 138.605622] WARNING: kernel stack regs at be299c9a in > > > procd:340 has bad 'bp' value 000001be > > > [ 138.605627] unwind stack type:0 next_sp: (null) mask:0x2 > > > graph_idx:0 > > > [ 138.605631] be299c9a: 299ceb00 (0x299ceb00) > > > [ 138.605633] be299c9e: 2281f1be (0x2281f1be) > > > [ 138.605634] be299ca2: 299cebb6 (0x299cebb6) > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > > > > > commit b09be676e0ff25bd6d2e7637e26d349f9109ad75 > > > locking/lockdep: Implement the 'crossrelease' feature > > > > Can we consider just reverting the crossrelease thing? > > > > The apparent stack corruption really worries me, and what worries me > > most is that commit wasn't even supposed to change anything as far as > > I can tell - it only adds infrastructure, no actual users that *set* > > the cross-lock thing. > > > > So the fact that it actually seems to cause behavioural changes seems > > to be _really_ scary, and indicates that the code is completely > > broken. > > > > Or am I missing something? > > So I gave crossrelease a bad rap here. Going back and looking at the > panics and stack dumps, what I thought was "stack corruption" was > actually the GCC unaligned stack pointer thing. > > I suspect those commits were implicated in the bisections because they > started doing more stack traces in general, revealing some existing > 32-bit unwinder/GCC/frame pointer bugs in the process. > > So I just wanted to clarify that crossrelease seems to be innocent in > all this. Sorry for the confusion!
Ok, I may have spoken too soon :-) There were so many issues here that it's been hard for me to untangle them all. There's one panic which seems different than the others: BUG: unable to handle kernel NULL pointer dereference at 00000020 IP: iput+0x544/0x650 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 29697 Comm: umount Not tainted 4.13.0-rc4-00169-gce07a941 #627 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 task: c0a0ba00 task.stack: c0a1e000 EIP: iput+0x544/0x650 EFLAGS: 00010246 CPU: 0 EAX: 00000001 EBX: c0100218 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: c0a1fdd8 ESP: c0a1fdc0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 80050033 CR2: 00000020 CR3: 10a03000 CR4: 00000690 Call Trace: dentry_unlink_inode+0x176/0x180 ? preempt_count_sub+0x1c5/0x2e0 __dentry_kill+0x207/0x330 shrink_dentry_list+0x5df/0x610 shrink_dcache_parent+0x65/0x80 do_one_tree+0x13/0x40 shrink_dcache_for_umount+0x84/0xe0 generic_shutdown_super+0x3e/0x1b0 kill_anon_super+0x11/0x20 kernfs_kill_sb+0x6c/0x80 sysfs_kill_sb+0x1a/0x30 deactivate_locked_super+0x4c/0x80 deactivate_super+0x100/0x110 cleanup_mnt+0xc0/0xe0 __cleanup_mnt+0x10/0x20 task_work_run+0x7f/0xa0 exit_to_usermode_loop+0x100/0x16f do_int80_syscall_32+0x27f/0x2e0 entry_INT80_32+0x2f/0x2f EIP: 0xa7f34a69 EFLAGS: 00000292 CPU: 0 EAX: 00000000 EBX: 080960f0 ECX: a7f76ff4 EDX: 080960d0 ESI: 080960d0 EDI: 080960f0 EBP: afe22258 ESP: afe22208 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Code: b5 20 56 3c b7 0f 84 64 fe ff ff e9 6b fe ff ff 8d b4 26 00 00 00 00 8b 7b 1c c1 ee 03 31 c9 83 05 ac 56 3c b7 01 83 e6 01 89 f2 <8b> 47 20 89 45 ec b8 c0 1f 30 b7 c7 04 24 00 00 00 00 e8 15 89 EIP: iput+0x544/0x650 SS:ESP: 0068:c0a1fdc0 CR2: 0000000000000020 ---[ end trace 0bfc95b7cf7c8ea4 ]--- Kernel panic - not syncing: Fatal exception And it was bisected to: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") That commit hadn't added the crossrelease feature yet, so it presumably didn't trigger the extra unwinder issues. Peter and I found some issues with that patch, and Peter came up with a fix. It would be good to know if Peter's patch makes that panic go away. I've rebased the fixes on top of the ce07a9415f26 commit and attached them to this email. Fengguang, if you're still listening, could you please rerun the tests on top of ce07a9415f26, with the attached patches also applied? -- Josh
>From e7840ad76515f0b5061fcdd098b57b7c01b61482 Mon Sep 17 00:00:00 2001 Message-Id: <e7840ad76515f0b5061fcdd098b57b7c01b61482.1507215196.git.jpoim...@redhat.com> From: Josh Poimboeuf <jpoim...@redhat.com> Date: Thu, 5 Oct 2017 09:43:59 -0500 Subject: [PATCH 1/2] unwinder fixes --- arch/x86/kernel/unwind_frame.c | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c index b9389d72b2f7..0ecc42e34cc4 100644 --- a/arch/x86/kernel/unwind_frame.c +++ b/arch/x86/kernel/unwind_frame.c @@ -33,7 +33,7 @@ static void unwind_dump(struct unwind_state *state) struct stack_info stack_info = {0}; unsigned long visit_mask = 0; - if (dumped_before) + if (IS_ENABLED(CONFIG_X86_32) || dumped_before) return; dumped_before = true; @@ -42,7 +42,8 @@ static void unwind_dump(struct unwind_state *state) state->stack_info.type, state->stack_info.next_sp, state->stack_mask, state->graph_idx); - for (sp = state->orig_sp; sp; sp = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { + for (sp = PTR_ALIGN(state->orig_sp, sizeof(long)); sp; + sp = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { if (get_stack_info(sp, state->task, &stack_info, &visit_mask)) break; @@ -84,6 +85,12 @@ static size_t regs_size(struct pt_regs *regs) return sizeof(*regs); } +#ifdef CONFIG_X86_32 +#define KERNEL_REGS_SIZE (sizeof(struct pt_regs) - 2*sizeof(long)) +#else +#define KERNEL_REGS_SIZE (sizeof(struct pt_regs)) +#endif + static bool in_entry_code(unsigned long ip) { char *addr = (char *)ip; @@ -183,6 +190,7 @@ static bool is_last_task_frame(struct unwind_state *state) * This determines if the frame pointer actually contains an encoded pointer to * pt_regs on the stack. See ENCODE_FRAME_POINTER. */ +#ifdef CONFIG_X86_64 static struct pt_regs *decode_frame_pointer(unsigned long *bp) { unsigned long regs = (unsigned long)bp; @@ -192,6 +200,17 @@ static struct pt_regs *decode_frame_pointer(unsigned long *bp) return (struct pt_regs *)(regs & ~0x1); } +#else +static struct pt_regs *decode_frame_pointer(unsigned long *bp) +{ + unsigned long regs = (unsigned long)bp; + + if (regs & 0x80000000) + return NULL; + + return (struct pt_regs *)(regs | 0x80000000); +} +#endif static bool update_stack_state(struct unwind_state *state, unsigned long *next_bp) @@ -211,7 +230,7 @@ static bool update_stack_state(struct unwind_state *state, regs = decode_frame_pointer(next_bp); if (regs) { frame = (unsigned long *)regs; - len = regs_size(regs); + len = KERNEL_REGS_SIZE; state->got_irq = true; } else { frame = next_bp; @@ -235,6 +254,14 @@ static bool update_stack_state(struct unwind_state *state, frame < prev_frame_end) return false; + /* + * On 32-bit with user mode regs, make sure the last two regs are safe + * to access: + */ + if (IS_ENABLED(CONFIG_X86_32) && regs && user_mode(regs) && + !on_stack(info, frame, len + 2*sizeof(long))) + return false; + /* Move state to the next frame: */ if (regs) { state->regs = regs; -- 2.13.6
>From 62105550632bfbd2e5e2f3768a37958a6872ec1e Mon Sep 17 00:00:00 2001 Message-Id: <62105550632bfbd2e5e2f3768a37958a6872ec1e.1507215196.git.jpoim...@redhat.com> In-Reply-To: <e7840ad76515f0b5061fcdd098b57b7c01b61482.1507215196.git.jpoim...@redhat.com> References: <e7840ad76515f0b5061fcdd098b57b7c01b61482.1507215196.git.jpoim...@redhat.com> From: Peter Zijlstra <pet...@infradead.org> Date: Thu, 5 Oct 2017 09:44:33 -0500 Subject: [PATCH 2/2] lockdep fixes --- kernel/locking/lockdep.c | 48 ++++++++++++++++++++---------------------------- 1 file changed, 20 insertions(+), 28 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 841828ba35b9..6d540bdb24b3 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1827,10 +1827,10 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev, struct held_lock *next, int distance, struct stack_trace *trace, int (*save)(struct stack_trace *trace)) { + struct lock_list *uninitialized_var(target_entry); struct lock_list *entry; - int ret; struct lock_list this; - struct lock_list *uninitialized_var(target_entry); + int ret; /* * Prove that the new <prev> -> <next> dependency would not @@ -1844,8 +1844,17 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev, this.class = hlock_class(next); this.parent = NULL; ret = check_noncircular(&this, hlock_class(prev), &target_entry); - if (unlikely(!ret)) + if (unlikely(!ret)) { + if (!trace->entries) { + /* + * If @save fails here, the printing might trigger + * a WARN but because of the !nr_entries it should + * not do bad things. + */ + save(trace); + } return print_circular_bug(&this, target_entry, next, prev); + } else if (unlikely(ret < 0)) return print_bfs_bug(ret); @@ -1892,7 +1901,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev, return print_bfs_bug(ret); - if (save && !save(trace)) + if (!trace->entries && !save(trace)) return 0; /* @@ -1912,20 +1921,6 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev, if (!ret) return 0; - /* - * Debugging printouts: - */ - if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) { - graph_unlock(); - printk("\n new dependency: "); - print_lock_name(hlock_class(prev)); - printk(KERN_CONT " => "); - print_lock_name(hlock_class(next)); - printk(KERN_CONT "\n"); - dump_stack(); - if (!graph_lock()) - return 0; - } return 2; } @@ -1940,8 +1935,12 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next) { int depth = curr->lockdep_depth; struct held_lock *hlock; - struct stack_trace trace; - int (*save)(struct stack_trace *trace) = save_trace; + struct stack_trace trace = { + .nr_entries = 0, + .max_entries = 0, + .entries = NULL, + .skip = 0, + }; /* * Debugging checks. @@ -1967,18 +1966,11 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next) */ if (hlock->read != 2 && hlock->check) { int ret = check_prev_add(curr, hlock, next, - distance, &trace, save); + distance, &trace, save_trace); if (!ret) return 0; /* - * Stop saving stack_trace if save_trace() was - * called at least once: - */ - if (save && ret == 2) - save = NULL; - - /* * Stop after the first non-trylock entry, * as non-trylock entries have added their * own direct dependencies already, so this -- 2.13.6