[ceph-users] ceph-mds failure replaying journal

Jon Morby Sun, 28 Oct 2018 16:49:09 -0700

We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a 
ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 and 
not jump a major release without warning)


Anyway .. as a result, we ended up with an mds journal error and 1 daemon 
reporting as damaged

Having got nowhere trying to ask for help on irc, we've followed various forum 
posts and disaster recovery guides, we ended up resetting the journal which 
left the daemon as no longer “damaged” however we’re now seeing mds segfault 
whilst trying to replay 

https://pastebin.com/iSLdvu0b <https://pastebin.com/iSLdvu0b>



/build/ceph-13.2.2/src/mds/journal.cc: 1572: FAILED 
assert(g_conf->mds_wipe_sessions)

 ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x102) [0x7fad637f70f2]
 2: (()+0x3162b7) [0x7fad637f72b7]
 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) [0x7a7a6b]
 4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9]
 5: (MDLog::_replay_thread()+0x864) [0x752164]
 6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d]
 7: (()+0x76ba) [0x7fad6305a6ba]
 8: (clone()+0x6d) [0x7fad6288341d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.


full logs

https://pastebin.com/X5UG9vT2 <https://pastebin.com/X5UG9vT2>


We’ve been unable to access the cephfs file system since all of this started …. 
attempts to mount fail with reports that “mds probably not available” 

Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds server is 
up


root@mds02:~# ceph -s
  cluster:
    id:     78d5bf7d-b074-47ab-8d73-bd4d99df98a5
    health: HEALTH_WARN
            1 filesystem is degraded
            insufficient standby MDS daemons available
            too many PGs per OSD (276 > max 250)

  services:
    mon: 3 daemons, quorum mon01,mon02,mon03
    mgr: mon01(active), standbys: mon02, mon03
    mds: fido_fs-2/2/1 up  {0=mds01=up:resolve,1=mds02=up:replay(laggy or 
crashed)}
    osd: 27 osds: 27 up, 27 in

  data:
    pools:   15 pools, 3168 pgs
    objects: 16.97 M objects, 30 TiB
    usage:   71 TiB used, 27 TiB / 98 TiB avail
    pgs:     3168 active+clean

  io:
    client:   680 B/s rd, 1.1 MiB/s wr, 0 op/s rd, 345 op/s wr


Before I just trash the entire fs and give up on ceph, does anyone have any 
suggestions as to how we can fix this?

root@mds02:~# ceph versions
{
    "mon": {
        "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
(stable)": 3
    },
    "mgr": {
        "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
(stable)": 3
    },
    "osd": {
        "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
luminous (stable)": 27
    },
    "mds": {
        "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
(stable)": 2
    },
    "overall": {
        "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
luminous (stable)": 27,
        "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
(stable)": 8
    }
}

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph-mds failure replaying journal

Reply via email to