Dear ceph users,
we're experiencing a segfault during MDS startup (replay process) which is
making our FS inaccessible.
MDS log messages:
Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c08f49700 1 -- 192.168.8.195:6800/3181891717 <== osd.26
192.168.8.209:6821/2419345 3 ==== osd_op_reply(21 1.00000000 [getxattr] v0'0
uv0 ondisk = -61 ((61) No data available)) v8 ==== 154+0+0 (3715233608 0 0)
0x2776340 con 0x18bd500
Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
Oct 15 03:41:39.894658 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x100) _fetched got 0 and 544
Oct 15 03:41:39.894658 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x100) magic is 'ceph fs volume v011'
(expecting 'ceph fs volume v011')
Oct 15 03:41:39.894735 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.snaprealm(0x100 seq 1 0x1799c00) open_parents
[1,head]
Oct 15 03:41:39.894735 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x100) _fetched [inode 0x100 [...2,head] ~mds0/
auth v275131 snaprealm=0x1799c00 f(v0 1=1+0) n(v76166 rc2020-07-17
15:29:27.000000 b41838692297 -3184=-3168+-16)/n() (iversion lock) 0x18bf800]
Oct 15 03:41:39.894821 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
Oct 15 03:41:39.894821 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x1) _fetched got 0 and 482
Oct 15 03:41:39.894891 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x1) magic is 'ceph fs volume v011' (expecting
'ceph fs volume v011')
Oct 15 03:41:39.894958 mds1 ceph-mds: -472> 2019-10-15 00:40:30.205
7f3c00589700 -1 *** Caught signal (Segmentation fault) **#012 in thread
7f3c00589700 thread_name:fn_anonymous#012#012 ceph version 13.2.6
(7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)#012 1: (()+0x11390)
[0x7f3c0e48a390]#012 2: (operator<<(std::ostream&, SnapRealm const&)+0x42)
[0x72cb92]#012 3: (SnapRealm::merge_to(SnapRealm*)+0x308) [0x72f488]#012 4:
(CInode::decode_snap_blob(ceph::buffer::list&)+0x53) [0x6e1f63]#012 5:
(CInode::decode_store(ceph::buffer::list::iterator&)+0x76) [0x702b86]#012 6:
(CInode::_fetched(ceph::buffer::list&, ceph::buffer::list&, Context*)+0x1b2)
[0x702da2]#012 7: (MDSIOContextBase::complete(int)+0x119) [0x74fcc9]#012 8:
(Finisher::finisher_thread_entry()+0x12e) [0x7f3c0ebffece]#012 9: (()+0x76ba)
[0x7f3c0e4806ba]#012 10: (clone()+0x6d) [0x7f3c0dca941d]#012 NOTE: a copy of
the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 15 03:41:39.895400 mds1 ceph-mds: --- logging levels ---
Oct 15 03:41:39.895473 mds1 ceph-mds: 0/ 5 none
Oct 15 03:41:39.895473 mds1 ceph-mds: 0/ 1 lockdep
Cluster status information:
cluster:
id: b8205875-e56f-4280-9e52-6aab9c758586
health: HEALTH_WARN
1 filesystem is degraded
1 nearfull osd(s)
11 pool(s) nearfull
services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon1(active), standbys: mon2, mon3
mds: fs_padrao-1/1/1 up {0=mds1=up:replay(laggy or crashed)}
osd: 90 osds: 90 up, 90 in
data:
pools: 11 pools, 1984 pgs
objects: 75.99 M objects, 285 TiB
usage: 457 TiB used, 181 TiB / 639 TiB avail
pgs: 1896 active+clean
87 active+clean+scrubbing+deep+repair
1 active+clean+scrubbing
io:
client: 89 KiB/s wr, 0 op/s rd, 3 op/s wr
Has anyone seen anything like this?
Regards,
Arthur
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io