On 12/15/2014 06:46 PM, Linus Torvalds wrote: > I cleaned up the patch a bit, split it up into two to clarify it, and > have committed it to my tree. I'm not marking the patches for stable, > because while I'm convinced it's a bug, I'm also not sure why even if > it triggers it doesn't eventually recover when the IO completes. So > I'd mark them for stable only if they are actually confirmed to fix > anything in the wild, and after they've gotten some testing in > general. The patches *look* straightforward, they remove more lines > than they add, and I think the code is more understandable too, but > maybe I just screwed up. Whatever. Some care is warranted, but this is > the first time I feel like I actually fixed something that matched at > least one of your lockup symptoms. > > Anyway, it's there as > > 26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling") > 7fb08eca4527 ("x86: mm: move mmap_sem unlock from mm_fault_error() to > caller")
I guess you did "just screwed up"... I've started seeing this: [ 240.190061] BUG: unable to handle kernel paging request at 00007f341768b000 [ 240.190061] IP: [<00007f341baf61fb>] 0x7f341baf61fb [ 240.190061] PGD 12b3e4067 PUD 12b3e5067 PMD 29a700067 PTE 0 [ 240.190061] Oops: 0004 [#10] PREEMPT SMP [ 240.190061] Dumping ftrace buffer: [ 240.190061] (ftrace buffer empty) [ 240.190061] Modules linked in: [ 240.190061] CPU: 6 PID: 9691 Comm: trinity-c619 Tainted: G D 3.18.0-sasha-08443-g2b40f4a #1618 [ 240.190061] task: ffff88012b346000 ti: ffff88012b3d4000 task.ti: ffff88012b3d4000 [ 240.190061] RIP: 0033:[<00007f341baf61fb>] [<00007f341baf61fb>] 0x7f341baf61fb [ 240.190061] RSP: 002b:00007fff39f045f8 EFLAGS: 00010206 [ 240.190061] RAX: 00007fff39f04600 RBX: 0000000000000363 RCX: 0000000000000200 [ 240.190061] RDX: 0000000000001000 RSI: 00007f341768b000 RDI: 00007fff39f04600 [ 240.190061] RBP: 00007fff39f05640 R08: 00007f341bdf20a8 R09: 00007f341bdf2100 [ 240.190061] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000001000 [ 240.190061] R13: 0000000000001000 R14: 0000000000362000 R15: 00007fff39f04600 [ 240.190061] FS: 00007f341bffb700(0000) GS:ffff8802da400000(0000) knlGS:0000000000000000 [ 240.190061] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 240.190061] CR2: 00007f341894801c CR3: 000000012b364000 CR4: 00000000000006a0 [ 240.190061] DR0: ffffffff81000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 240.190061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000b0602 [ 240.190061] [ 240.190061] RIP [<00007f341baf61fb>] 0x7f341baf61fb [ 240.190061] RSP <00007fff39f045f8> [ 240.190061] CR2: 00007f341768b000 Which was bisected down to: 26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling") Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/