Hi list,

We have some ceph clients that would reboot intermittently. We always see this 
stack dump ​from dmesg prior to the hosts rebooting:

 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332063] ------------[ cut 
here ]------------
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332067] WARNING: CPU: 11 
PID: 229190 at net/ceph/osd_client.c:497 request_reinit+0x140/0x180 
[libceph]May 10 06:52:33 s-jn4vh63.sys.az1.cust.ash.wd kernel: 
[38386170.332067] Modules linked in: joydev rbd libceph dns_resolver dell_rbu 
udp_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl nfs lockd grace 
fscache tcp_diag inet_diag uas usb_storage binfmt_misc nf_conntrack_netlink 
ip6table_mangle ip6table_raw xt_NFLOG xt_u32 nf_conntrack_ipv6 nf_defrag_ipv6 
xt_LOG nf_conntrack_tftp nf_conntrack_ftp iptable_raw iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle xt_set 
xt_multiport xt_conntrack nf_conntrack ip_set_hash_netport ip_set_hash_ipport 
ip_set_hash_net ip_set_hash_ip nfnetlink_log ip_set nfnetlink ip6table_filter 
ip6_tables iptable_filter mpt3sas mpt2sas raid_class scsi_transport_sas mptctl 
mptbase drbg ansi_cprng dm_crypt loop bonding sunrpc vfat fat dm_mod skx_edac 
intel_powerclamp coretemp intel_rapl iosf_mbi kvm dell_smbios iTCO_wdt 
iTCO_vendor_support dell_wmi_descriptor irqbypass crc32_pclmul 
ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper 
cryptd pcspkr sg mgag200 i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect ixgbe sysimgblt fb_sys_fops drm ptp pps_core mdio dca 
drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si 
ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables xfs libcrc32c 
sd_mod crc_t10dif crct10dif_generic ahci libahci libata crct10dif_pclmul 
crct10dif_common crc32c_intel megaraid_sas nfit libnvdimm [last unloaded: 
dell_rbu]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332094] CPU: 11 PID: 
229190 Comm: kworker/11:1 Tainted: G W ------------ 3.10.0-1127.10.1.el7.x86_64 
#1
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332095] Hardware name: 
Dell Inc. PowerEdge R740xd/XXXX0, BIOS 2.8.2 08/27/2020
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332098] Workqueue: events 
handle_timeout [libceph]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332099] Call Trace:
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332100] 
[<ffffffffb957ffa5>] dump_stack+0x19/0x1b
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332102] 
[<ffffffffb8e9bd18>] __warn+0xd8/0x100
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332103] 
[<ffffffffb8e9be5d>] warn_slowpath_null+0x1d/0x20
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332106] 
[<ffffffffc0bec600>] request_reinit+0x140/0x180 [libceph]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332110] 
[<ffffffffc0bf357a>] handle_timeout+0x3aa/0x770 [libceph]
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332111] 
[<ffffffffb8ebe6bf>] process_one_work+0x17f/0x440
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332113] 
[<ffffffffb8ebf7d6>] worker_thread+0x126/0x3c0
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332114] 
[<ffffffffb8ebf6b0>] ? manage_workers.isra.26+0x2a0/0x2a0
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332115] 
[<ffffffffb8ec6691>] kthread+0xd1/0xe0
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332117] 
[<ffffffffb8ec65c0>] ? insert_kthread_work+0x40/0x40
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332118] 
[<ffffffffb9592d1d>] ret_from_fork_nospec_begin+0x7/0x21
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332119] 
[<ffffffffb8ec65c0>] ? insert_kthread_work+0x40/0x40
 Jan 10 06:52:33 xxxxxhostnamexxxxx kernel: [38386170.332120] ---[ end trace 
5aeee0f10a265d18 ]---


[root@xxxxxhostnamexxxxx ~]# ceph --version
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus 
(stable)
[root@xxxxxhostnamexxxxx ~]# uname -a
Linux xxxxxhostnamexxxxx 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 
UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

(Apologies for the formatting)

Any suggestions on how we should go about troubleshooting this?

Thank you in advance.

Jkr
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to