cephfs is recoverable. Just set mds_wipe_sessions to 1. After mds recovers, set it back to 0 and flush journal (ceph daemon mds.x flush journal)
On Mon, Oct 29, 2018 at 7:13 PM Jon Morby (Fido) <j...@fido.net> wrote: > I've experimented and whilst the downgrade looks to be working, you end up > with errors regarding unsupported feature "mimic" amongst others > > 2018-10-29 10:51:20.652047 7f6f1b9f5080 -1 ERROR: on disk data includes > unsupported features: compat={},rocompat={},incompat={10=mimic ondisk layou > > so I gave up on that idea > > In addition to the cephfs volume (which is basically just mirrors and some > backups) we have a large rbd deployment using the same ceph cluster, and if > we lose that we're screwed ... the cephfs volume was more an "experiment" > to see how viable it would be as an NFS replacement > > There's 26TB of data on there, so I'd rather not have to go off and > redownload it all .. but losing it isn't the end of the world (but it will > piss off a few friends) > > Jon > > > ----- On 29 Oct, 2018, at 09:54, Zheng Yan <uker...@gmail.com> wrote: > > > > On Mon, Oct 29, 2018 at 5:25 PM Jon Morby (Fido) <j...@fido.net> wrote: > >> Hi >> >> Ideally we'd like to undo the whole accidental upgrade to 13.x and ensure >> that ceph-deploy doesn't do another major release upgrade without a lot of >> warnings >> >> Either way, I'm currently getting errors that 13.2.1 isn't available / >> shaman is offline / etc >> >> What's the best / recommended way of doing this downgrade across our >> estate? >> >> > You have already upgraded ceph-mon. I don't know If it can be safely > downgraded (If I remember right, I corrupted monitor's data when > downgrading ceph-mon from minic to luminous). > > >> >> >> ----- On 29 Oct, 2018, at 08:19, Yan, Zheng <uker...@gmail.com> wrote: >> >> >> We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then >> run 'ceph mds repaired fido_fs:1" . >> Sorry for the trouble >> Yan, Zheng >> >> On Mon, Oct 29, 2018 at 7:48 AM Jon Morby <j...@fido.net> wrote: >> >>> >>> We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a >>> ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 >>> and not jump a major release without warning) >>> >>> Anyway .. as a result, we ended up with an mds journal error and 1 >>> daemon reporting as damaged >>> >>> Having got nowhere trying to ask for help on irc, we've followed various >>> forum posts and disaster recovery guides, we ended up resetting the journal >>> which left the daemon as no longer “damaged” however we’re now seeing mds >>> segfault whilst trying to replay >>> >>> https://pastebin.com/iSLdvu0b >>> >>> >>> >>> /build/ceph-13.2.2/src/mds/journal.cc: 1572: FAILED >>> assert(g_conf->mds_wipe_sessions) >>> >>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic >>> (stable) >>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x102) [0x7fad637f70f2] >>> 2: (()+0x3162b7) [0x7fad637f72b7] >>> 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) >>> [0x7a7a6b] >>> 4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9] >>> 5: (MDLog::_replay_thread()+0x864) [0x752164] >>> 6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d] >>> 7: (()+0x76ba) [0x7fad6305a6ba] >>> 8: (clone()+0x6d) [0x7fad6288341d] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> >>> full logs >>> >>> https://pastebin.com/X5UG9vT2 >>> >>> >>> We’ve been unable to access the cephfs file system since all of this >>> started …. attempts to mount fail with reports that “mds probably not >>> available” >>> >>> Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds >>> server is up >>> >>> >>> root@mds02:~# ceph -s >>> cluster: >>> id: 78d5bf7d-b074-47ab-8d73-bd4d99df98a5 >>> health: HEALTH_WARN >>> 1 filesystem is degraded >>> insufficient standby MDS daemons available >>> too many PGs per OSD (276 > max 250) >>> >>> services: >>> mon: 3 daemons, quorum mon01,mon02,mon03 >>> mgr: mon01(active), standbys: mon02, mon03 >>> mds: fido_fs-2/2/1 up {0=mds01=up:resolve,1=mds02=up:replay(laggy >>> or crashed)} >>> osd: 27 osds: 27 up, 27 in >>> >>> data: >>> pools: 15 pools, 3168 pgs >>> objects: 16.97 M objects, 30 TiB >>> usage: 71 TiB used, 27 TiB / 98 TiB avail >>> pgs: 3168 active+clean >>> >>> io: >>> client: 680 B/s rd, 1.1 MiB/s wr, 0 op/s rd, 345 op/s wr >>> >>> >>> Before I just trash the entire fs and give up on ceph, does anyone have >>> any suggestions as to how we can fix this? >>> >>> root@mds02:~# ceph versions >>> { >>> "mon": { >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) >>> mimic (stable)": 3 >>> }, >>> "mgr": { >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) >>> mimic (stable)": 3 >>> }, >>> "osd": { >>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) >>> luminous (stable)": 27 >>> }, >>> "mds": { >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) >>> mimic (stable)": 2 >>> }, >>> "overall": { >>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) >>> luminous (stable)": 27, >>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) >>> mimic (stable)": 8 >>> } >>> } >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> -- >> ------------------------------ >> Jon Morby >> FidoNet - the internet made simple! >> 10 - 16 Tiller Road, London, E14 8PX >> tel: 0345 004 3050 / fax: 0345 004 3051 >> >> Need more rack space? >> Check out our Co-Lo offerings at http://www.fido.net/services/colo/ >> <http://www.fido.net/services/colo/%20>32 amp racks in London and >> Brighton >> Linx ConneXions available at all Fido sites! >> https://www.fido.net/services/backbone/connexions/ >> PGP Key <http://jonmorby.com/B3B5AD3A.asc>: 26DC B618 DE9E F9CB F8B7 >> 1EFA 2A64 BA69 B3B5 AD3A - http://jonmorby.com/B3B5AD3A.asc >> > > > -- > ------------------------------ > Jon Morby > FidoNet - the internet made simple! > 10 - 16 Tiller Road, London, E14 8PX > tel: 0345 004 3050 / fax: 0345 004 3051 > > Need more rack space? > Check out our Co-Lo offerings at http://www.fido.net/services/colo/ > <http://www.fido.net/services/colo/%20>32 amp racks in London and Brighton > Linx ConneXions available at all Fido sites! > https://www.fido.net/services/backbone/connexions/ > PGP Key <http://jonmorby.com/B3B5AD3A.asc>: 26DC B618 DE9E F9CB F8B7 1EFA > 2A64 BA69 B3B5 AD3A - http://jonmorby.com/B3B5AD3A.asc >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com