We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then run 'ceph mds repaired fido_fs:1" .
Sorry for the trouble Yan, Zheng On Mon, Oct 29, 2018 at 7:48 AM Jon Morby <j...@fido.net> wrote: > > We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a > ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 > and not jump a major release without warning) > > Anyway .. as a result, we ended up with an mds journal error and 1 daemon > reporting as damaged > > Having got nowhere trying to ask for help on irc, we've followed various > forum posts and disaster recovery guides, we ended up resetting the journal > which left the daemon as no longer “damaged” however we’re now seeing mds > segfault whilst trying to replay > > https://pastebin.com/iSLdvu0b > > > > /build/ceph-13.2.2/src/mds/journal.cc: 1572: FAILED > assert(g_conf->mds_wipe_sessions) > > ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x7fad637f70f2] > 2: (()+0x3162b7) [0x7fad637f72b7] > 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) > [0x7a7a6b] > 4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9] > 5: (MDLog::_replay_thread()+0x864) [0x752164] > 6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d] > 7: (()+0x76ba) [0x7fad6305a6ba] > 8: (clone()+0x6d) [0x7fad6288341d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > > full logs > > https://pastebin.com/X5UG9vT2 > > > We’ve been unable to access the cephfs file system since all of this > started …. attempts to mount fail with reports that “mds probably not > available” > > Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds > server is up > > > root@mds02:~# ceph -s > cluster: > id: 78d5bf7d-b074-47ab-8d73-bd4d99df98a5 > health: HEALTH_WARN > 1 filesystem is degraded > insufficient standby MDS daemons available > too many PGs per OSD (276 > max 250) > > services: > mon: 3 daemons, quorum mon01,mon02,mon03 > mgr: mon01(active), standbys: mon02, mon03 > mds: fido_fs-2/2/1 up {0=mds01=up:resolve,1=mds02=up:replay(laggy or > crashed)} > osd: 27 osds: 27 up, 27 in > > data: > pools: 15 pools, 3168 pgs > objects: 16.97 M objects, 30 TiB > usage: 71 TiB used, 27 TiB / 98 TiB avail > pgs: 3168 active+clean > > io: > client: 680 B/s rd, 1.1 MiB/s wr, 0 op/s rd, 345 op/s wr > > > Before I just trash the entire fs and give up on ceph, does anyone have > any suggestions as to how we can fix this? > > root@mds02:~# ceph versions > { > "mon": { > "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) > mimic (stable)": 3 > }, > "mgr": { > "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) > mimic (stable)": 3 > }, > "osd": { > "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 27 > }, > "mds": { > "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) > mimic (stable)": 2 > }, > "overall": { > "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 27, > "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) > mimic (stable)": 8 > } > } > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com