Re: [ceph-users] ceph-mds failure replaying journal

Jon Morby (Fido) Mon, 29 Oct 2018 02:25:38 -0700

Hi 

Ideally we'd like to undo the whole accidental upgrade to 13.x and ensure that 
ceph-deploy doesn't do another major release upgrade without a lot of warnings


Either way, I'm currently getting errors that 13.2.1 isn't available / shaman 
is offline / etc 

What's the best / recommended way of doing this downgrade across our estate? 

----- On 29 Oct, 2018, at 08:19, Yan, Zheng <uker...@gmail.com> wrote: 

> We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then run 
> 'ceph
> mds repaired fido_fs:1" .
> Sorry for the trouble
> Yan, Zheng

> On Mon, Oct 29, 2018 at 7:48 AM Jon Morby < [ mailto:j...@fido.net | 
> j...@fido.net
> ] > wrote:

>> We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a
>> ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 and
>> not jump a major release without warning)

>> Anyway .. as a result, we ended up with an mds journal error and 1 daemon
>> reporting as damaged

>> Having got nowhere trying to ask for help on irc, we've followed various 
>> forum
>> posts and disaster recovery guides, we ended up resetting the journal which
>> left the daemon as no longer “damaged” however we’re now seeing mds segfault
>> whilst trying to replay

>> [ https://pastebin.com/iSLdvu0b | https://pastebin.com/iSLdvu0b ]

>> /build/ceph-13.2.2/src/mds/ [ http://journal.cc/ | journal.cc ] : 1572: 
>> FAILED
>> assert(g_conf->mds_wipe_sessions)

>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x102)
>> [0x7fad637f70f2]
>> 2: (()+0x3162b7) [0x7fad637f72b7]
>> 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) 
>> [0x7a7a6b]
>> 4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9]
>> 5: (MDLog::_replay_thread()+0x864) [0x752164]
>> 6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d]
>> 7: (()+0x76ba) [0x7fad6305a6ba]
>> 8: (clone()+0x6d) [0x7fad6288341d]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.

>> full logs

>> [ https://pastebin.com/X5UG9vT2 | https://pastebin.com/X5UG9vT2 ]

>> We’ve been unable to access the cephfs file system since all of this started 
>> ….
>> attempts to mount fail with reports that “mds probably not available”

>> Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds server 
>> is
>> up

>> root@mds02:~# ceph -s
>> cluster:
>> id: 78d5bf7d-b074-47ab-8d73-bd4d99df98a5
>> health: HEALTH_WARN
>> 1 filesystem is degraded
>> insufficient standby MDS daemons available
>> too many PGs per OSD (276 > max 250)

>> services:
>> mon: 3 daemons, quorum mon01,mon02,mon03
>> mgr: mon01(active), standbys: mon02, mon03
>> mds: fido_fs-2/2/1 up {0=mds01=up:resolve,1=mds02=up:replay(laggy or 
>> crashed)}
>> osd: 27 osds: 27 up, 27 in

>> data:
>> pools: 15 pools, 3168 pgs
>> objects: 16.97 M objects, 30 TiB
>> usage: 71 TiB used, 27 TiB / 98 TiB avail
>> pgs: 3168 active+clean

>> io:
>> client: 680 B/s rd, 1.1 MiB/s wr, 0 op/s rd, 345 op/s wr

>> Before I just trash the entire fs and give up on ceph, does anyone have any
>> suggestions as to how we can fix this?

>> root@mds02:~# ceph versions
>> {
>> "mon": {
>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>> (stable)":
>> 3
>> },
>> "mgr": {
>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>> (stable)":
>> 3
>> },
>> "osd": {
>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous
>> (stable)": 27
>> },
>> "mds": {
>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>> (stable)":
>> 2
>> },
>> "overall": {
>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous
>> (stable)": 27,
>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>> (stable)":
>> 8
>> }
>> }

>> _______________________________________________
>> ceph-users mailing list
>> [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ]
>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]

-- 

Jon Morby 
FidoNet - the internet made simple! 
10 - 16 Tiller Road, London, E14 8PX 
tel: 0345 004 3050 / fax: 0345 004 3051 

Need more rack space? 
Check out our Co-Lo offerings at [ http://www.fido.net/services/colo/%20 | 
http://www.fido.net/services/colo/  ] 32 amp racks in London and Brighton 
Linx ConneXions available at all Fido sites! [ 
https://www.fido.net/services/backbone/connexions/ | 
https://www.fido.net/services/backbone/connexions/ ] 
[ http://jonmorby.com/B3B5AD3A.asc | PGP Key ] : 26DC B618 DE9E F9CB F8B7 1EFA 
2A64 BA69 B3B5 AD3A - http://jonmorby.com/B3B5AD3A.asc

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-mds failure replaying journal

Reply via email to