fyi, downgrading to 13.2.1 doesn't seem to have fixed the issue either :( --- end dump of recent events --- 2018-10-29 10:27:50.440 7feb58b43700 -1 *** Caught signal (Aborted) ** in thread 7feb58b43700 thread_name:md_log_replay
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) 1: (()+0x3ebf40) [0x55deff8e0f40] 2: (()+0x11390) [0x7feb68246390] 3: (gsignal()+0x38) [0x7feb67993428] 4: (abort()+0x16a) [0x7feb6799502a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7feb689a5630] 6: (()+0x2e26a7) [0x7feb689a56a7] 7: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) [0x55deff8ccc8b] 8: (EUpdate::replay(MDSRank*)+0x39) [0x55deff8ce1c9] 9: (MDLog::_replay_thread()+0x864) [0x55deff876974] 10: (MDLog::ReplayThread::entry()+0xd) [0x55deff61a95d] 11: (()+0x76ba) [0x7feb6823c6ba] 12: (clone()+0x6d) [0x7feb67a6541d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2018-10-29 10:27:50.440 7feb58b43700 -1 *** Caught signal (Aborted) ** in thread 7feb58b43700 thread_name:md_log_replay ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) 1: (()+0x3ebf40) [0x55deff8e0f40] 2: (()+0x11390) [0x7feb68246390] 3: (gsignal()+0x38) [0x7feb67993428] 4: (abort()+0x16a) [0x7feb6799502a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7feb689a5630] 6: (()+0x2e26a7) [0x7feb689a56a7] 7: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) [0x55deff8ccc8b] 8: (EUpdate::replay(MDSRank*)+0x39) [0x55deff8ce1c9] 9: (MDLog::_replay_thread()+0x864) [0x55deff876974] 10: (MDLog::ReplayThread::entry()+0xd) [0x55deff61a95d] 11: (()+0x76ba) [0x7feb6823c6ba] 12: (clone()+0x6d) [0x7feb67a6541d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 3/ 3 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 0 journaler 0/ 5 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 1/ 1 reserver 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 99/99 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mds.mds04.log --- end dump of recent events --- ----- On 29 Oct, 2018, at 09:25, Jon Morby <j...@fido.net> wrote: > Hi > Ideally we'd like to undo the whole accidental upgrade to 13.x and ensure that > ceph-deploy doesn't do another major release upgrade without a lot of warnings > Either way, I'm currently getting errors that 13.2.1 isn't available / shaman > is > offline / etc > What's the best / recommended way of doing this downgrade across our estate? > ----- On 29 Oct, 2018, at 08:19, Yan, Zheng <uker...@gmail.com> wrote: >> We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then run >> 'ceph >> mds repaired fido_fs:1" . >> Sorry for the trouble >> Yan, Zheng >> On Mon, Oct 29, 2018 at 7:48 AM Jon Morby < [ mailto:j...@fido.net | >> j...@fido.net >> ] > wrote: >>> We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a >>> ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 and >>> not jump a major release without warning) >>> Anyway .. as a result, we ended up with an mds journal error and 1 daemon >>> reporting as damaged >>> Having got nowhere trying to ask for help on irc, we've followed various >>> forum >>> posts and disaster recovery guides, we ended up resetting the journal which >>> left the daemon as no longer “damaged” however we’re now seeing mds segfault >>> whilst trying to replay >>> [ https://pastebin.com/iSLdvu0b | https://pastebin.com/iSLdvu0b ] >>> /build/ceph-13.2.2/src/mds/ [ http://journal.cc/ | journal.cc ] : 1572: >>> FAILED >>> assert(g_conf->mds_wipe_sessions) >>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic >>> (stable) >>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x102) >>> [0x7fad637f70f2] >>> 2: (()+0x3162b7) [0x7fad637f72b7] >>> 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) >>> [0x7a7a6b] >>> 4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9] >>> 5: (MDLog::_replay_thread()+0x864) [0x752164] >>> 6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d] >>> 7: (()+0x76ba) [0x7fad6305a6ba] >>> 8: (clone()+0x6d) [0x7fad6288341d] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >>> interpret this. >>> full logs >>> [ https://pastebin.com/X5UG9vT2 | https://pastebin.com/X5UG9vT2 ] >>> We’ve been unable to access the cephfs file system since all of this >>> started …. >>> attempts to mount fail with reports that “mds probably not available” >>> Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds >>> server is >>> up >>> root@mds02:~# ceph -s >>> cluster: >>> id: 78d5bf7d-b074-47ab-8d73-bd4d99df98a5 >>> health: HEALTH_WARN >>> 1 filesystem is degraded >>> insufficient standby MDS daemons available >>> too many PGs per OSD (276 > max 250) >>> services: >>> mon: 3 daemons, quorum mon01,mon02,mon03 >>> mgr: mon01(active), standbys: mon02, mon03 >>> mds: fido_fs-2/2/1 up {0=mds01=up:resolve,1=mds02=up:replay(laggy or >>> crashed)} >>> osd: 27 osds: 27 up, 27 in >>> data: >>> pools: 15 pools, 3168 pgs >>> objects: 16.97 M objects, 30 TiB >>> usage: 71 TiB used, 27 TiB / 98 TiB avail >>> pgs: 3168 active+clean >>> io: >>> client: 680 B/s rd, 1.1 MiB/s wr, 0 op/s rd, 345 op/s wr >>> Before I just trash the entire fs and give up on ceph, does anyone have any >>> suggestions as to how we can fix this? >>> root@mds02:~# ceph versions >>> { >>> "mon": { >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic >>> (stable)": >>> 3 >>> }, >>> "mgr": { >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic >>> (stable)": >>> 3 >>> }, >>> "osd": { >>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous >>> (stable)": 27 >>> }, >>> "mds": { >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic >>> (stable)": >>> 2 >>> }, >>> "overall": { >>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous >>> (stable)": 27, >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic >>> (stable)": >>> 8 >>> } >>> } >>> _______________________________________________ >>> ceph-users mailing list >>> [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] >>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] > -- > Jon Morby > FidoNet - the internet made simple! > 10 - 16 Tiller Road, London, E14 8PX > tel: 0345 004 3050 / fax: 0345 004 3051 > Need more rack space? > Check out our Co-Lo offerings at [ http://www.fido.net/services/colo/%20 | > http://www.fido.net/services/colo/ ] 32 amp racks in London and Brighton > Linx ConneXions available at all Fido sites! [ > https://www.fido.net/services/backbone/connexions/ | > https://www.fido.net/services/backbone/connexions/ ] > [ http://jonmorby.com/B3B5AD3A.asc | PGP Key ] : 26DC B618 DE9E F9CB F8B7 1EFA > 2A64 BA69 B3B5 AD3A - http://jonmorby.com/B3B5AD3A.asc > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jon Morby FidoNet - the internet made simple! 10 - 16 Tiller Road, London, E14 8PX tel: 0345 004 3050 / fax: 0345 004 3051 Need more rack space? Check out our Co-Lo offerings at [ http://www.fido.net/services/colo/%20 | http://www.fido.net/services/colo/ ] 32 amp racks in London and Brighton Linx ConneXions available at all Fido sites! [ https://www.fido.net/services/backbone/connexions/ | https://www.fido.net/services/backbone/connexions/ ] [ http://jonmorby.com/B3B5AD3A.asc | PGP Key ] : 26DC B618 DE9E F9CB F8B7 1EFA 2A64 BA69 B3B5 AD3A - http://jonmorby.com/B3B5AD3A.asc
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com