Hello,
This is my first real issue since running Ceph for several months. Here's the
situation:
I've been running an Emperor cluster for several months. All was good. I
decided to upgrade since I'm running Ubuntu 13.10 and 0.72.2. I decided to
first upgrade Ceph to 0.80.4, which was the last version in the apt repository
for 13.10. I upgrade the MON's, then the OSD servers to 0.80.4; all went as
expected with no issues. The last thing I did was upgrade the MDS using the
same process, but now the MDS won't start. I've tried to manually start the
MDS with debugging on, and I have attached the file. It complains that it's
looking for "mds.0.20 need osdmap epoch 3602, have 3601".
Anyway, I'd don't really use CephFS or RGW, so I don't need the MDS, but I'd
like to have it. Can someone tell me how to fix it, or delete it, so I can
start over when I do need it? Right now my cluster is HEALTH_WARN because of
it.
Thanks!
Brad
2014-09-10 15:48:13.830787 7fae3c48e7c0 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 3166
2014-09-10 15:48:13.834336 7fae3c48e7c0 10 mds.-1.0 168 MDSCacheObject
2014-09-10 15:48:13.834349 7fae3c48e7c0 10 mds.-1.0 2168 CInode
2014-09-10 15:48:13.834355 7fae3c48e7c0 10 mds.-1.0 16 elist<>::item *7=112
2014-09-10 15:48:13.834359 7fae3c48e7c0 10 mds.-1.0 392 inode_t
2014-09-10 15:48:13.834361 7fae3c48e7c0 10 mds.-1.0 56 nest_info_t
2014-09-10 15:48:13.834364 7fae3c48e7c0 10 mds.-1.0 32 frag_info_t
2014-09-10 15:48:13.834370 7fae3c48e7c0 10 mds.-1.0 40 SimpleLock *5=200
2014-09-10 15:48:13.834373 7fae3c48e7c0 10 mds.-1.0 48 ScatterLock *3=144
2014-09-10 15:48:13.834377 7fae3c48e7c0 10 mds.-1.0 488 CDentry
2014-09-10 15:48:13.834379 7fae3c48e7c0 10 mds.-1.0 16 elist<>::item
2014-09-10 15:48:13.834383 7fae3c48e7c0 10 mds.-1.0 40 SimpleLock
2014-09-10 15:48:13.834385 7fae3c48e7c0 10 mds.-1.0 1024 CDir
2014-09-10 15:48:13.834387 7fae3c48e7c0 10 mds.-1.0 16 elist<>::item *2=32
2014-09-10 15:48:13.834390 7fae3c48e7c0 10 mds.-1.0 192 fnode_t
2014-09-10 15:48:13.834392 7fae3c48e7c0 10 mds.-1.0 56 nest_info_t *2
2014-09-10 15:48:13.834394 7fae3c48e7c0 10 mds.-1.0 32 frag_info_t *2
2014-09-10 15:48:13.834399 7fae3c48e7c0 10 mds.-1.0 168 Capability
2014-09-10 15:48:13.834402 7fae3c48e7c0 10 mds.-1.0 32 xlist<>::item *2=64
2014-09-10 15:48:13.835815 7fae3c486700 10 mds.-1.0 MDS::ms_get_authorizer type=mon
2014-09-10 15:48:13.836113 7fae37292700 5 mds.-1.0 ms_handle_connect on 156.74.237.50:6789/0
2014-09-10 15:48:13.839873 7fae3c48e7c0 10 mds.-1.0 beacon_send up:boot seq 1 (currently up:boot)
2014-09-10 15:48:13.840110 7fae3c48e7c0 10 mds.-1.0 create_logger
2014-09-10 15:48:13.867040 7fae37292700 5 mds.-1.0 handle_mds_map epoch 149 from mon.0
2014-09-10 15:48:13.867109 7fae37292700 10 mds.-1.0 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}
2014-09-10 15:48:13.867122 7fae37292700 10 mds.-1.0 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}
2014-09-10 15:48:13.867136 7fae37292700 10 mds.-1.-1 map says i am 156.74.237.56:6800/3166 mds.-1.-1 state down:dne
2014-09-10 15:48:13.867151 7fae37292700 10 mds.-1.-1 not in map yet
2014-09-10 15:48:14.164620 7fae37292700 5 mds.-1.-1 handle_mds_map epoch 150 from mon.0
2014-09-10 15:48:14.164706 7fae37292700 10 mds.-1.-1 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}
2014-09-10 15:48:14.164716 7fae37292700 10 mds.-1.-1 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}
2014-09-10 15:48:14.164727 7fae37292700 10 mds.-1.0 map says i am 156.74.237.56:6800/3166 mds.-1.0 state up:standby
2014-09-10 15:48:14.164739 7fae37292700 10 mds.-1.0 peer mds gid 5192121 removed from map
2014-09-10 15:48:14.164757 7fae37292700 1 mds.-1.0 handle_mds_map standby
2014-09-10 15:48:14.237027 7fae37292700 5 mds.-1.0 handle_mds_map epoch 151 from mon.0
2014-09-10 15:48:14.237060 7fae37292700 10 mds.-1.0 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}
2014-09-10 15:48:14.237070 7fae37292700 10 mds.-1.0 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}
2014-09-10 15:48:14.237079 7fae37292700 10 mds.0.20 map says i am 156.74.237.56:6800/3166 mds.0.20 state up:replay
2014-09-10 15:48:14.237091 7fae37292700 1 mds.0.20 handle_mds_map i am now mds.0.20
2014-09-10 15:48:14.237098 7fae37292700 1 mds.0.20 handle_mds_map state change up:standby --> up:replay
2014-09-10 15:48:14.237108 7fae37292700 1 mds.0.20 replay_start
2014-09-10 15:48:14.237124 7fae37292700 7 mds.0.cache set_recovery_set
2014-09-10 15:48:14.237133 7fae37292700 1 mds.0.20 recovery set is
2014-09-10 15:48:14.237137 7fae37292700 1 mds.0.20 need osdmap epoch 3602, have 3601
2014-09-10 15:48:14.237141 7fae37292700 1 mds.0.20 waiting for osdmap 3602 (which blacklists prior instance)
2014-09-10 15:48:14.237788 7fae37292700 2 mds.0.20 boot_start 1: opening inotable
2014-09-10 15:48:14.237801 7fae37292700 10 mds.0.inotable: load
2014-09-10 15:48:14.238016 7fae37292700 2 mds.0.20 boot_start 1: opening sessionmap
2014-09-10 15:48:14.238021 7fae37292700 10 mds.0.sessionmap load
2014-09-10 15:48:14.238102 7fae37292700 2 mds.0.20 boot_start 1: opening anchor table
2014-09-10 15:48:14.238114 7fae37292700 10 mds.0.anchortable: load
2014-09-10 15:48:14.238204 7fae37292700 2 mds.0.20 boot_start 1: opening snap table
2014-09-10 15:48:14.238208 7fae37292700 10 mds.0.snaptable: load
2014-09-10 15:48:14.238325 7fae37292700 2 mds.0.20 boot_start 1: opening mds log
2014-09-10 15:48:14.238336 7fae37292700 5 mds.0.log open discovering log bounds
2014-09-10 15:48:14.238369 7fae37292700 1 mds.0.journaler(ro) recover start
2014-09-10 15:48:14.238374 7fae37292700 1 mds.0.journaler(ro) read_head
2014-09-10 15:48:14.238842 7fae3418b700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.238906 7fae33f89700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.239026 7fae33e88700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.239031 7fae3408a700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.239166 7fae33d87700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.239557 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.54:6812/2738
2014-09-10 15:48:14.239746 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.55:6812/2154
2014-09-10 15:48:14.239832 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.54:6818/2791
2014-09-10 15:48:14.240035 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.54:6800/2704
2014-09-10 15:48:14.240085 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.53:6818/3080
2014-09-10 15:48:14.241149 7fae37292700 10 mds.0.sessionmap dump
2014-09-10 15:48:14.241161 7fae37292700 10 mds.0.sessionmap _load_finish v 0, 0 sessions, 22 bytes
2014-09-10 15:48:14.241191 7fae37292700 10 mds.0.sessionmap dump
2014-09-10 15:48:14.241242 7fae37292700 10 mds.0.inotable: load_2 got 34 bytes
2014-09-10 15:48:14.241246 7fae37292700 10 mds.0.inotable: load_2 loaded v0
2014-09-10 15:48:14.241401 7fae37292700 10 mds.0.snaptable: load_2 got 46 bytes
2014-09-10 15:48:14.241405 7fae37292700 10 mds.0.snaptable: load_2 loaded v0
2014-09-10 15:48:14.241716 7fae37292700 1 mds.0.journaler(ro) _finish_read_head loghead(trim 4194304, expire 4194304, write 4204232). probing for end of log (from 4204232)...
2014-09-10 15:48:14.241734 7fae37292700 1 mds.0.journaler(ro) probing for end of the log
2014-09-10 15:48:14.242501 7fae33781700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.242632 7fae33680700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.243129 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.53:6801/2987
2014-09-10 15:48:14.243265 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.54:6815/2782
2014-09-10 15:48:14.244569 7fae37292700 1 mds.0.journaler(ro) _finish_probe_end write_pos = 4204446 (header had 4204232). recovered.
2014-09-10 15:48:14.245856 7fae37292700 10 mds.0.anchortable: load_2 got 34 bytes
2014-09-10 15:48:14.245874 7fae37292700 10 mds.0.anchortable: load_2 loaded v0
2014-09-10 15:48:14.245890 7fae37292700 2 mds.0.20 boot_start 2: loading/discovering base inodes
2014-09-10 15:48:14.245900 7fae37292700 0 mds.0.cache creating system inode with ino:100
2014-09-10 15:48:14.245991 7fae37292700 10 mds.0.cache.ino(100) fetch
2014-09-10 15:48:14.246191 7fae37292700 0 mds.0.cache creating system inode with ino:1
2014-09-10 15:48:14.246197 7fae37292700 10 mds.0.cache.ino(1) fetch
2014-09-10 15:48:14.246813 7fae3337d700 10 mds.0.20 MDS::ms_get_authorizer type=osd
2014-09-10 15:48:14.247422 7fae37292700 5 mds.0.20 ms_handle_connect on 156.74.237.55:6806/2110
2014-09-10 15:48:14.247820 7fae37292700 10 mds.0.cache.ino(1) _fetched got 0 and 440
2014-09-10 15:48:14.247831 7fae37292700 10 mds.0.cache.ino(1) magic is 'ceph fs volume v011' (expecting 'ceph fs volume v011')
2014-09-10 15:48:14.247853 7fae37292700 20 mds.0.cache.ino(1) decode_snap_blob snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x3652f40)
2014-09-10 15:48:14.247868 7fae37292700 10 mds.0.cache.ino(1) _fetched [inode 1 [...2,head] / auth v1 snaprealm=0x3652f40 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) 0x3744878]
2014-09-10 15:48:14.248845 7fae37292700 10 mds.0.cache.ino(100) _fetched got 0 and 440
2014-09-10 15:48:14.248854 7fae37292700 10 mds.0.cache.ino(100) magic is 'ceph fs volume v011' (expecting 'ceph fs volume v011')
2014-09-10 15:48:14.248883 7fae37292700 20 mds.0.cache.ino(100) decode_snap_blob snaprealm(100 seq 1 lc 0 cr 0 cps 1 snaps={} 0x3653600)
2014-09-10 15:48:14.248888 7fae37292700 10 mds.0.cache.ino(100) _fetched [inode 100 [...2,head] ~mds0/ auth v1 snaprealm=0x3653600 f(v0 11=1+10) n(v0 11=1+10) (iversion lock) 0x3744000]
2014-09-10 15:48:14.248905 7fae37292700 2 mds.0.20 boot_start 3: replaying mds log
2014-09-10 15:48:14.248909 7fae37292700 10 mds.0.log replay start, from 4194304 to 4204446
2014-09-10 15:48:14.248982 7fae3317b700 10 mds.0.log _replay_thread start
2014-09-10 15:48:14.249010 7fae3317b700 10 mds.0.journaler(ro) _is_readable read_buf.length() == 0, but need 4 for next entry; fetch_len is 41943040
2014-09-10 15:48:14.249019 7fae3317b700 10 mds.0.journaler(ro) _is_readable: not readable, returning false
2014-09-10 15:48:14.249021 7fae3317b700 10 mds.0.journaler(ro) _prefetch
2014-09-10 15:48:14.249023 7fae3317b700 10 mds.0.journaler(ro) _prefetch 41943040 requested_pos 4194304 < target 4204446 (46137344), prefetching 10142
2014-09-10 15:48:14.249031 7fae3317b700 10 mds.0.journaler(ro) _issue_read reading 4194304~10142, read pointers 4194304/4194304/4204446
2014-09-10 15:48:14.249134 7fae3317b700 10 mds.0.journaler(ro) wait_for_readable at 4194304 onreadable 0x362c370
2014-09-10 15:48:14.249142 7fae3317b700 10 mds.0.journaler(ro) _is_readable read_buf.length() == 0, but need 4 for next entry; fetch_len is 41943040
2014-09-10 15:48:14.249146 7fae3317b700 10 mds.0.journaler(ro) _is_readable: not readable, returning false
2014-09-10 15:48:14.250075 7fae37292700 10 mds.0.journaler(ro) _finish_read got 4194304~10142
2014-09-10 15:48:14.250086 7fae37292700 10 mds.0.journaler(ro) _is_readable read_buf.length() == 0, but need 4 for next entry; fetch_len is 41943040
2014-09-10 15:48:14.250092 7fae37292700 10 mds.0.journaler(ro) _is_readable: not readable, returning false
2014-09-10 15:48:14.250095 7fae37292700 10 mds.0.journaler(ro) _assimilate_prefetch 4194304~10142
2014-09-10 15:48:14.250097 7fae37292700 10 mds.0.journaler(ro) _assimilate_prefetch read_buf now 4194304~10142, read pointers 4194304/4204446/4204446
2014-09-10 15:48:14.250101 7fae37292700 10 mds.0.journaler(ro) _finish_read now readable (or at journal end)
2014-09-10 15:48:14.250113 7fae37292700 10 mds.0.journaler(ro) _prefetch
2014-09-10 15:48:14.250122 7fae3317b700 10 mds.0.journaler(ro) _prefetch
2014-09-10 15:48:14.250128 7fae3317b700 10 mds.0.journaler(ro) _prefetch
2014-09-10 15:48:14.250130 7fae3317b700 10 mds.0.journaler(ro) _prefetch
2014-09-10 15:48:14.250131 7fae3317b700 10 mds.0.journaler(ro) _prefetch
2014-09-10 15:48:14.250133 7fae3317b700 10 mds.0.journaler(ro) try_read_entry at 4194304 reading 4194304~4 (have 10142)
2014-09-10 15:48:14.250136 7fae3317b700 0 mds.0.journaler(ro) try_read_entry got 0 len entry at offset 4194304
2014-09-10 15:48:14.250139 7fae3317b700 10 mds.0.journaler(ro) _prefetch
2014-09-10 15:48:14.250144 7fae3317b700 0 mds.0.log _replay journaler got error -22, aborting
2014-09-10 15:48:14.250148 7fae3317b700 10 mds.0.journaler(ro) reread_head
2014-09-10 15:48:14.251486 7fae3317b700 10 mds.0.log standby_trim_segments
2014-09-10 15:48:14.251495 7fae3317b700 10 mds.0.log expire_pos=4194304
2014-09-10 15:48:14.251499 7fae3317b700 20 mds.0.log removed no segments!
2014-09-10 15:48:14.251506 7fae3317b700 10 mds.0.log _replay_thread kicking waiters
2014-09-10 15:48:14.251509 7fae3317b700 0 mds.0.20 boot_start encountered an error, failing
2014-09-10 15:48:14.251511 7fae3317b700 1 mds.0.20 suicide. wanted down:dne, now up:replay
2014-09-10 15:48:14.252751 7fae3317b700 10 mds.0.log _replay_thread finish
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com