On Fri, Oct 7, 2016 at 1:40 PM, Nikolay Borisov <ker...@kyup.com> wrote:
> Hello,
>
> I've encountered yet another cephfs crash:
>
> [990188.822271] BUG: unable to handle kernel NULL pointer dereference at 
> 000000000000001c
> [990188.822790] IP: [<ffffffff81130515>] __free_pages+0x5/0x30
> [990188.823090] PGD 180dd8f067 PUD 1bf2722067 PMD 0
> [990188.823506] Oops: 0002 [#1] SMP
> [990188.831274] CPU: 25 PID: 18418 Comm: php-fpm Tainted: G           O    
> 4.4.20-clouder2 #6
> [990188.831650] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015
> [990188.831876] task: ffff8822a3b7b700 ti: ffff88022427c000 task.ti: 
> ffff88022427c000
> [990188.832249] RIP: 0010:[<ffffffff81130515>]  [<ffffffff81130515>] 
> __free_pages+0x5/0x30
> [990188.832691] RSP: 0000:ffff88022427fda8  EFLAGS: 00010246
> [990188.832914] RAX: 00000000fffffe00 RBX: 0000000000000f3d RCX: 
> 00000000c0000100
> [990188.833292] RDX: 00000000000047f2 RSI: 0000000000000000 RDI: 
> 0000000000000000
> [990188.833670] RBP: ffff88022427fe50 R08: ffff88022427c000 R09: 
> 00038459d3aa3ee4
> [990188.834049] R10: 000000013b00e4b8 R11: 0000000000000000 R12: 
> 0000000000000000
> [990188.834429] R13: ffff8802c5189f88 R14: ffff881091270ca8 R15: 
> ffff88022427fe70
> [990188.838820] FS:  00007fc8ff5cb7c0(0000) GS:ffff881fffba0000(0000) 
> knlGS:0000000000000000
> [990188.839197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [990188.839420] CR2: 000000000000001c CR3: 0000000405f7e000 CR4: 
> 00000000001406e0
> [990188.839797] Stack:
> [990188.840013]  ffffffffa044a1bc ffff880600000000 0000000000000000 
> ffff88022427fe70
> [990188.840639]  ffff8802c5189f88 ffff88189297b6a0 ffffffff00000f3d 
> ffff8810fffffe00
> [990188.841263]  ffff88022427fe98 00000000ffffffff 0000000000002000 
> ffff8802c5189c20
> [990188.841886] Call Trace:
> [990188.842115]  [<ffffffffa044a1bc>] ? ceph_read_iter+0x19c/0x5f0 [ceph]
> [990188.842345]  [<ffffffff81198c27>] __vfs_read+0xa7/0xd0
> [990188.842568]  [<ffffffff81199216>] vfs_read+0x86/0x130
> [990188.842792]  [<ffffffff81199fb6>] SyS_read+0x46/0xa0
> [990188.843018]  [<ffffffff81614f5b>] entry_SYSCALL_64_fastpath+0x16/0x6e
> [990188.843243] Code: e2 48 89 de ff d1 49 8b 0f 48 85 c9 75 e8 65 ff 0d 99 
> a7 ed 7e eb 85 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
> <f0> ff 4f 1c 74 01 c3 55 85 f6 48 89 e5 74 07 e8 f7 f5 ff ff 5d
> [990188.847887] RIP  [<ffffffff81130515>] __free_pages+0x5/0x30
> [990188.848183]  RSP <ffff88022427fda8>
> [990188.848404] CR2: 000000000000001c
>
> The problem is that page(%RDI) being passed to __free_pages is NULL. Also
> retry_op is CHECK_EOF(1), so the page allocation didn't execute which leads
> to the null page. statret is : fffffe00 which seems to be -ERESTARTSYS.

Looks like this one exists upsteam - -ERESTARTSYS is returned from
__ceph_do_getattr() if the process is killed while waiting for the
reply from the MDS.  At first sight it's just a busted error path,
but it could use more testing.  Zheng?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to