Are you using Ubuntu 16.04 (Guessing from your kernel version). There was a 
numa bug in early kernels, try updating to the latest in
the 4.4 series.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
VELARTIS Philipp Dürhammer
Sent: 01 December 2016 12:04
To: 'ceph-us...@ceph.com' <ceph-us...@ceph.com>
Subject: [ceph-users] osd crash

 

Hello!

 

Tonight i had a osd crash. See the dump below. Also this osd is still mounted. 
Whats the cause? A bug? What to do next?

 

Thank You!

 

Dec  1 00:31:30 ceph2 kernel: [17314369.493029] divide error: 0000 [#1] SMP

Dec  1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police 
cls_basic sch_ingress sch_htb vhost_net vhost macvtap
macvlan 8021q garp mrp veth nfsv3 softdog ip6t_REJECT nf_reject_ipv6 
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
xt_mac ipt_REJECT nf_reject_ipv4 xt_NFLOG nfnetlink_log xt_physdev 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp xt_addrtype
xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set nfnetlink 
iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl
nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core 
ib_addr iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi bonding xfs libcrc32c ipmi_ssif mxm_wmi 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper 
ablk_helper cryptd snd_pcm snd_timer snd soundcore
pcspkr input_leds sb_edac shpchp edac_core mei_me ioatdma mei lpc_ich i2c_i801 
ipmi_si 8250_fintek wmi ipmi_msghandler mac_hid
nf_conntrack_ftp nf_conntrack autofs4 ses enclosure hid_generic usbmouse usbkbd 
usbhid hid ixgbe(O) vxlan ip6_udp_tunnel
megaraid_sas udp_tunnel isci ahci libahci libsas igb(O) scsi_transport_sas dca 
ptp pps_core fjes

Dec  1 00:31:30 ceph2 kernel: [17314369.493708] CPU: 1 PID: 17291 Comm: 
ceph-osd Tainted: G           O    4.4.8-1-pve #1

Dec  1 00:31:30 ceph2 kernel: [17314369.493754] Hardware name: Thomas-Krenn.AG 
X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013

Dec  1 00:31:30 ceph2 kernel: [17314369.493799] task: ffff881f6ff05280 ti: 
ffff880037c4c000 task.ti: ffff880037c4c000

Dec  1 00:31:30 ceph2 kernel: [17314369.493843] RIP: 0010:[<ffffffff810b58fd>]  
[<ffffffff810b58fd>] task_numa_find_cpu+0x23d/0x710

Dec  1 00:31:30 ceph2 kernel: [17314369.493893] RSP: 0000:ffff880037c4fbd8  
EFLAGS: 00010257

Dec  1 00:31:30 ceph2 kernel: [17314369.493919] RAX: 0000000000000000 RBX: 
ffff880037c4fc80 RCX: 0000000000000000

Dec  1 00:31:30 ceph2 kernel: [17314369.493962] RDX: 0000000000000000 RSI: 
ffff88103fa40000 RDI: ffff881033f50c00

Dec  1 00:31:30 ceph2 kernel: [17314369.494006] RBP: ffff880037c4fc48 R08: 
0000000202046ea8 R09: 000000000000036b

Dec  1 00:31:30 ceph2 kernel: [17314369.494049] R10: 000000000000007c R11: 
0000000000000540 R12: ffff88064fbd0000

Dec  1 00:31:30 ceph2 kernel: [17314369.494093] R13: 0000000000000250 R14: 
0000000000000540 R15: 0000000000000009

Dec  1 00:31:30 ceph2 kernel: [17314369.494136] FS:  00007ff17dd6c700(0000) 
GS:ffff88103fa40000(0000) knlGS:0000000000000000

Dec  1 00:31:30 ceph2 kernel: [17314369.494182] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033

Dec  1 00:31:30 ceph2 kernel: [17314369.494209] CR2: 00007ff17dd6aff8 CR3: 
0000001025e4b000 CR4: 00000000001426e0

Dec  1 00:31:30 ceph2 kernel: [17314369.494252] Stack:

Dec  1 00:31:30 ceph2 kernel: [17314369.494273]  ffff880037c4fbe8 
ffffffff81038219 000000000000003f 0000000000017180

Dec  1 00:31:30 ceph2 kernel: [17314369.494323]  ffff881f6ff05280 
0000000000017180 0000000000000251 ffffffffffffffe7

Dec  1 00:31:30 ceph2 kernel: [17314369.494374]  0000000000000251 
ffff881f6ff05280 ffff880037c4fc80 00000000000000cb

Dec  1 00:31:30 ceph2 kernel: [17314369.494424] Call Trace:

Dec  1 00:31:30 ceph2 kernel: [17314369.494449]  [<ffffffff81038219>] ? 
sched_clock+0x9/0x10

Dec  1 00:31:30 ceph2 kernel: [17314369.494476]  [<ffffffff810b62b6>] 
task_numa_migrate+0x4e6/0xa00

Dec  1 00:31:30 ceph2 kernel: [17314369.494506]  [<ffffffff813fea6c>] ? 
copy_to_iter+0x7c/0x260

Dec  1 00:31:30 ceph2 kernel: [17314369.494534]  [<ffffffff810b6849>] 
numa_migrate_preferred+0x79/0x80

Dec  1 00:31:30 ceph2 kernel: [17314369.494563]  [<ffffffff810bb348>] 
task_numa_fault+0x848/0xd10

Dec  1 00:31:30 ceph2 kernel: [17314369.494591]  [<ffffffff810ba969>] ? 
should_numa_migrate_memory+0x59/0x130

Dec  1 00:31:30 ceph2 kernel: [17314369.494623]  [<ffffffff811c0314>] 
handle_mm_fault+0xc64/0x1a20

Dec  1 00:31:30 ceph2 kernel: [17314369.494654]  [<ffffffff8170c3f4>] ? 
SYSC_recvfrom+0x144/0x160

Dec  1 00:31:30 ceph2 kernel: [17314369.494684]  [<ffffffff8106b4ed>] 
__do_page_fault+0x19d/0x410

Dec  1 00:31:30 ceph2 kernel: [17314369.494713]  [<ffffffff81003360>] ? 
exit_to_usermode_loop+0xb0/0xd0

Dec  1 00:31:30 ceph2 kernel: [17314369.494742]  [<ffffffff8106b782>] 
do_page_fault+0x22/0x30

Dec  1 00:31:30 ceph2 kernel: [17314369.494771]  [<ffffffff8184ab38>] 
page_fault+0x28/0x30

Dec  1 00:31:30 ceph2 kernel: [17314369.494797] Code: 4d b0 4c 89 ef e8 b4 d0 
ff ff 48 8b 4d b0 49 8b 85 b0 00 00 00 31 d2 48 0f af
81 d8 01 00 00 49 8b 4d 78 4c 8b 6b 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 
c0 48 29 c1 4c 03 43 48 4c 39 75 d0

Dec  1 00:31:30 ceph2 kernel: [17314369.495005] RIP  [<ffffffff810b58fd>] 
task_numa_find_cpu+0x23d/0x710

Dec  1 00:31:30 ceph2 kernel: [17314369.495035]  RSP <ffff880037c4fbd8>

Dec  1 00:31:30 ceph2 kernel: [17314369.495347] ---[ end trace 7106c9a72840cc7d 
]---

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to