Looking better... working on scrubbing..HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds; 1 pgs incomplete; 12 pgs inconsistent; 2 pgs repair; 1 pgs stuck inactive; 1 pgs stuck unclean; 109 scrub errors; too few PGs per OSD (29 < min 30); mds rank 0 has failed; mds cluster is degraded; noout flag(s) set; no legacy OSD present but 'sortbitwise' flag is not set
Now PG1.28.. looking at all old osds dead or alive. Only one with DIR_* directory is in osd.4. This appears to be metadata pool! 21M of metadata can be quite a bit of stuff.. so I would like to rescue this! But I am not able to start this OSD. exporting through ceph-objectstore-tool appears to crash. Even with --skip-journal-replay and --skip-mount-omap (different failure). As I mentioned in earlier email, that exception thrown message is bogus...# ceph-objectstore-tool --op export --pgid 1.28 --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --file ~/1.28.exportterminate called after throwing an instance of 'std::domain_error' what(): coll_t::decode(): don't know how to decode version 1*** Caught signal (Aborted) ** in thread 7f812e7fb940 thread_name:ceph-objectstor ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x996a57) [0x55dee175fa57] 2: (()+0x110c0) [0x7f812d0050c0] 3: (gsignal()+0xcf) [0x7f812b438fcf] 4: (abort()+0x16a) [0x7f812b43a3fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f812bd1fb3d] 6: (()+0x5ebb6) [0x7f812bd1dbb6] 7: (()+0x5ec01) [0x7f812bd1dc01] 8: (()+0x5ee19) [0x7f812bd1de19] 9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x55dee143001e] 10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x55dee156d5f5] 11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x55dee1562bb9] 12: (DBObjectMap::init(bool)+0x288) [0x55dee1561eb8] 13: (FileStore::mount()+0x2525) [0x55dee1498eb5] 14: (main()+0x28c0) [0x55dee10c9400] 15: (__libc_start_main()+0xf1) [0x7f812b4262b1] 16: (()+0x34f747) [0x55dee1118747]Aborted# ceph-objectstore-tool --op export --pgid 1.28 --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --file ~/1.28.export --skip-journal-replayterminate called after throwing an instance of 'std::domain_error' what(): coll_t::decode(): don't know how to decode version 1*** Caught signal (Aborted) ** in thread 7fa6d087b940 thread_name:ceph-objectstor ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x996a57) [0x55abd356aa57] 2: (()+0x110c0) [0x7fa6cf0850c0] 3: (gsignal()+0xcf) [0x7fa6cd4b8fcf] 4: (abort()+0x16a) [0x7fa6cd4ba3fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fa6cdd9fb3d] 6: (()+0x5ebb6) [0x7fa6cdd9dbb6] 7: (()+0x5ec01) [0x7fa6cdd9dc01] 8: (()+0x5ee19) [0x7fa6cdd9de19] 9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x55abd323b01e] 10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x55abd33785f5] 11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x55abd336dbb9] 12: (DBObjectMap::init(bool)+0x288) [0x55abd336ceb8] 13: (FileStore::mount()+0x2525) [0x55abd32a3eb5] 14: (main()+0x28c0) [0x55abd2ed4400] 15: (__libc_start_main()+0xf1) [0x7fa6cd4a62b1] 16: (()+0x34f747) [0x55abd2f23747]Aborted# ceph-objectstore-tool --op export --pgid 1.28 --data-path /var/lib/ceph/osd/ceph-4 --journal-path /var/lib/ceph/osd/ceph-4/journal --file ~/1.28.export --skip-mount-omapceph-objectstore-tool: /usr/include/boost/smart_ptr/scoped_ptr.hpp:99: T* boost::scoped_ptr<T>::operator->() const [with T = ObjectMap]: Assertion `px != 0' failed.*** Caught signal (Aborted) ** in thread 7f14345c5940 thread_name:ceph-objectstor ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x996a57) [0x5575b50a9a57] 2: (()+0x110c0) [0x7f1432dcf0c0] 3: (gsignal()+0xcf) [0x7f1431202fcf] 4: (abort()+0x16a) [0x7f14312043fa] 5: (()+0x2be37) [0x7f14311fbe37] 6: (()+0x2bee2) [0x7f14311fbee2] 7: (()+0x2fa19c) [0x5575b4a0d19c] 8: (FileStore::omap_get_values(coll_t const&, ghobject_t const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*)+0x6c2) [0x5575b4dc9322] 9: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x235) [0x5575b4ab3135] 10: (main()+0x5bd6) [0x5575b4a16716] 11: (__libc_start_main()+0xf1) [0x7f14311f02b1] 12: (()+0x34f747) [0x5575b4a62747] When trying to bring up osd.4 we get this message. Feels very similar to that crash in first two above. ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x960e57) [0x5565e564ae57] 2: (()+0x110c0) [0x7f34aa17e0c0] 3: (gsignal()+0xcf) [0x7f34a81c4fcf] 4: (abort()+0x16a) [0x7f34a81c63fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f34a8aabb3d] 6: (()+0x5ebb6) [0x7f34a8aa9bb6] 7: (()+0x5ec01) [0x7f34a8aa9c01] 8: (()+0x5ee19) [0x7f34a8aa9e19] 9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x5565e531933e] 10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x5565e54c02f5] 11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x5565e54b58b9] 12: (DBObjectMap::init(bool)+0x288) [0x5565e54b4bb8] 13: (FileStore::mount()+0x2525) [0x5565e53e0185] 14: (OSD::init()+0x27d) [0x5565e50797ed] 15: (main()+0x2a64) [0x5565e4fe05d4] 16: (__libc_start_main()+0xf1) [0x7f34a81b22b1] 17: (()+0x341117) [0x5565e502b117] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.4.log--- end dump of recent events ---2017-09-15 18:54:46.429439 7fc5fbc867c0 0 set uid:gid to 1001:1001 (ceph:ceph)2017-09-15 18:54:46.429451 7fc5fbc867c0 0 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 226712017-09-15 18:54:46.430384 7fc5fbc867c0 0 pidfile_write: ignore empty --pid-file2017-09-15 18:54:46.439836 7fc5fbc867c0 0 filestore(/var/lib/ceph/osd/ceph-4) backend xfs (magic 0x58465342)2017-09-15 18:54:46.440234 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option2017-09-15 18:54:46.440238 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option2017-09-15 18:54:46.440250 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: splice is supported2017-09-15 18:54:46.474005 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)2017-09-15 18:54:46.474186 7fc5fbc867c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: extsize is disabled by conf2017-09-15 18:54:46.475050 7fc5fbc867c0 1 leveldb: Recovering log #315392017-09-15 18:54:46.617922 7fc5fbc867c0 1 leveldb: Delete type=3 #31538 2017-09-15 18:54:46.617978 7fc5fbc867c0 1 leveldb: Delete type=0 #31539 2017-09-15 18:54:56.846756 7fc5fbc867c0 -1 *** Caught signal (Aborted) ** in thread 7fc5fbc867c0 thread_name:ceph-osd ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x960e57) [0x558935ed9e57] 2: (()+0x110c0) [0x7fc5faaf40c0] 3: (gsignal()+0xcf) [0x7fc5f8b3afcf] 4: (abort()+0x16a) [0x7fc5f8b3c3fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fc5f9421b3d] 6: (()+0x5ebb6) [0x7fc5f941fbb6] 7: (()+0x5ec01) [0x7fc5f941fc01] 8: (()+0x5ee19) [0x7fc5f941fe19] 9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x558935ba833e] 10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x558935d4f2f5] 11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x558935d448b9] 12: (DBObjectMap::init(bool)+0x288) [0x558935d43bb8] 13: (FileStore::mount()+0x2525) [0x558935c6f185] 14: (OSD::init()+0x27d) [0x5589359087ed] 15: (main()+0x2a64) [0x55893586f5d4] 16: (__libc_start_main()+0xf1) [0x7fc5f8b282b1] 17: (()+0x341117) [0x5589358ba117] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -53> 2017-09-15 18:54:46.423990 7fc5fbc867c0 5 asok(0x558941004000) register_command perfcounters_dump hook 0x558940f4c030 -52> 2017-09-15 18:54:46.424007 7fc5fbc867c0 5 asok(0x558941004000) register_command 1 hook 0x558940f4c030 -51> 2017-09-15 18:54:46.424011 7fc5fbc867c0 5 asok(0x558941004000) register_command perf dump hook 0x558940f4c030 -50> 2017-09-15 18:54:46.424015 7fc5fbc867c0 5 asok(0x558941004000) register_command perfcounters_schema hook 0x558940f4c030 -49> 2017-09-15 18:54:46.424018 7fc5fbc867c0 5 asok(0x558941004000) register_command 2 hook 0x558940f4c030 -48> 2017-09-15 18:54:46.424021 7fc5fbc867c0 5 asok(0x558941004000) register_command perf schema hook 0x558940f4c030 -47> 2017-09-15 18:54:46.424024 7fc5fbc867c0 5 asok(0x558941004000) register_command perf reset hook 0x558940f4c030 -46> 2017-09-15 18:54:46.424027 7fc5fbc867c0 5 asok(0x558941004000) register_command config show hook 0x558940f4c030 -45> 2017-09-15 18:54:46.424031 7fc5fbc867c0 5 asok(0x558941004000) register_command config set hook 0x558940f4c030 -44> 2017-09-15 18:54:46.424034 7fc5fbc867c0 5 asok(0x558941004000) register_command config get hook 0x558940f4c030 -43> 2017-09-15 18:54:46.424037 7fc5fbc867c0 5 asok(0x558941004000) register_command config diff hook 0x558940f4c030 -42> 2017-09-15 18:54:46.424040 7fc5fbc867c0 5 asok(0x558941004000) register_command log flush hook 0x558940f4c030 -41> 2017-09-15 18:54:46.424043 7fc5fbc867c0 5 asok(0x558941004000) register_command log dump hook 0x558940f4c030 -40> 2017-09-15 18:54:46.424047 7fc5fbc867c0 5 asok(0x558941004000) register_command log reopen hook 0x558940f4c030 -39> 2017-09-15 18:54:46.429439 7fc5fbc867c0 0 set uid:gid to 1001:1001 (ceph:ceph) -38> 2017-09-15 18:54:46.429451 7fc5fbc867c0 0 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 22671 -37> 2017-09-15 18:54:46.430326 7fc5fbc867c0 1 -- 192.168.1.31:0/0 learned my addr 192.168.1.31:0/0 -36> 2017-09-15 18:54:46.430333 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.1.31:6819/22671 need_addr=0 -35> 2017-09-15 18:54:46.430346 7fc5fbc867c0 1 -- 192.168.2.31:0/0 learned my addr 192.168.2.31:0/0 -34> 2017-09-15 18:54:46.430350 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.2.31:6818/22671 need_addr=0 -33> 2017-09-15 18:54:46.430360 7fc5fbc867c0 1 -- 192.168.2.31:0/0 learned my addr 192.168.2.31:0/0 -32> 2017-09-15 18:54:46.430363 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.2.31:6819/22671 need_addr=0 -31> 2017-09-15 18:54:46.430378 7fc5fbc867c0 1 -- 192.168.1.31:0/0 learned my addr 192.168.1.31:0/0 -30> 2017-09-15 18:54:46.430381 7fc5fbc867c0 1 accepter.accepter.bind my_inst.addr is 192.168.1.31:6823/22671 need_addr=0 -29> 2017-09-15 18:54:46.430384 7fc5fbc867c0 0 pidfile_write: ignore empty --pid-file -28> 2017-09-15 18:54:46.432245 7fc5fbc867c0 5 asok(0x558941004000) init /var/run/ceph/ceph-osd.4.asok -27> 2017-09-15 18:54:46.432253 7fc5fbc867c0 5 asok(0x558941004000) bind_and_listen /var/run/ceph/ceph-osd.4.asok -26> 2017-09-15 18:54:46.432316 7fc5fbc867c0 5 asok(0x558941004000) register_command 0 hook 0x558940f480d0 -25> 2017-09-15 18:54:46.432322 7fc5fbc867c0 5 asok(0x558941004000) register_command version hook 0x558940f480d0 -24> 2017-09-15 18:54:46.432326 7fc5fbc867c0 5 asok(0x558941004000) register_command git_version hook 0x558940f480d0 -23> 2017-09-15 18:54:46.432329 7fc5fbc867c0 5 asok(0x558941004000) register_command help hook 0x558940f4c1e0 -22> 2017-09-15 18:54:46.432333 7fc5fbc867c0 5 asok(0x558941004000) register_command get_command_descriptions hook 0x558940f4c1f0 -21> 2017-09-15 18:54:46.432359 7fc5f543e700 5 asok(0x558941004000) entry start -20> 2017-09-15 18:54:46.432381 7fc5fbc867c0 10 monclient(hunting): build_initial_monmap -19> 2017-09-15 18:54:46.439452 7fc5fbc867c0 5 adding auth protocol: none -18> 2017-09-15 18:54:46.439462 7fc5fbc867c0 5 adding auth protocol: none -17> 2017-09-15 18:54:46.439608 7fc5fbc867c0 5 asok(0x558941004000) register_command objecter_requests hook 0x558940f4c2b0 -16> 2017-09-15 18:54:46.439678 7fc5fbc867c0 1 -- 192.168.1.31:6819/22671 messenger.start -15> 2017-09-15 18:54:46.439700 7fc5fbc867c0 1 -- :/0 messenger.start -14> 2017-09-15 18:54:46.439713 7fc5fbc867c0 1 -- 192.168.1.31:6823/22671 messenger.start -13> 2017-09-15 18:54:46.439726 7fc5fbc867c0 1 -- 192.168.2.31:6819/22671 messenger.start -12> 2017-09-15 18:54:46.439738 7fc5fbc867c0 1 -- 192.168.2.31:6818/22671 messenger.start -11> 2017-09-15 18:54:46.439750 7fc5fbc867c0 1 -- :/0 messenger.start -10> 2017-09-15 18:54:46.439791 7fc5fbc867c0 2 osd.4 0 mounting /var/lib/ceph/osd/ceph-4 /var/lib/ceph/osd/ceph-4/journal -9> 2017-09-15 18:54:46.439836 7fc5fbc867c0 0 filestore(/var/lib/ceph/osd/ceph-4) backend xfs (magic 0x58465342) -8> 2017-09-15 18:54:46.440234 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option -7> 2017-09-15 18:54:46.440238 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option -6> 2017-09-15 18:54:46.440250 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: splice is supported -5> 2017-09-15 18:54:46.474005 7fc5fbc867c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) -4> 2017-09-15 18:54:46.474186 7fc5fbc867c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: extsize is disabled by conf -3> 2017-09-15 18:54:46.475050 7fc5fbc867c0 1 leveldb: Recovering log #31539 -2> 2017-09-15 18:54:46.617922 7fc5fbc867c0 1 leveldb: Delete type=3 #31538 -1> 2017-09-15 18:54:46.617978 7fc5fbc867c0 1 leveldb: Delete type=0 #31539 0> 2017-09-15 18:54:56.846756 7fc5fbc867c0 -1 *** Caught signal (Aborted) ** in thread 7fc5fbc867c0 thread_name:ceph-osd ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x960e57) [0x558935ed9e57] 2: (()+0x110c0) [0x7fc5faaf40c0] 3: (gsignal()+0xcf) [0x7fc5f8b3afcf] 4: (abort()+0x16a) [0x7fc5f8b3c3fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fc5f9421b3d] 6: (()+0x5ebb6) [0x7fc5f941fbb6] 7: (()+0x5ec01) [0x7fc5f941fc01] 8: (()+0x5ee19) [0x7fc5f941fe19] 9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) [0x558935ba833e] 10: (DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) [0x558935d4f2f5] 11: (DBObjectMap::check(std::ostream&, bool)+0x279) [0x558935d448b9] 12: (DBObjectMap::init(bool)+0x288) [0x558935d43bb8] 13: (FileStore::mount()+0x2525) [0x558935c6f185] 14: (OSD::init()+0x27d) [0x5589359087ed] 15: (main()+0x2a64) [0x55893586f5d4] 16: (__libc_start_main()+0xf1) [0x7fc5f8b282b1] 17: (()+0x341117) [0x5589358ba117] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.4.log--- end dump of recent events --- What can I do to save that PG1.28? Please let me know if you need more information. So close!... =) Regards,Hong
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com