Hi, I'm seeing lots of issues with my CEPH installation. The health of the system is degraded and many of the OSD are down.
# ceph -v ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) # ceph health HEALTH_ERR 2002 pgs degraded; 14 pgs down; 180 pgs inconsistent; 14 pgs peering; 1 pgs stale; 2002 pgs stuck degraded; 14 pgs stuck inactive; 1 pgs stuck stale; 2320 pgs stuck unclean; 2002 pgs stuck undersized; 2002 pgs undersized; 100 requests are blocked > 32 sec; recovery 38033332/531925830 objects degraded (7.150%); recovery 48881596/531925830 objects misplaced (9.190%); 12623 scrub errors; 11/320 in osds are down; noout flag(s) set Log for one of the down OSDes shows: -5> 2016-02-05 19:10:45.294873 7fd4d58e4700 1 -- 10.31.0.3:6835/157558 --> 10.31.0.5:0/3796 -- osd_ping(ping_reply e144138 stamp 2016-02-05 19:10:45.286934) v2 -- ?+ 0 0x4359a00 con 0x2bc9ac60 -4> 2016-02-05 19:10:45.294915 7fd4d70e7700 1 -- 10.31.0.67:6835/157558 --> 10.31.0.5:0/3796 -- osd_ping(ping_reply e144138 stamp 2016-02-05 19:10:45.286934) v2 -- ? +0 0x27e21800 con 0x2bacd700 -3> 2016-02-05 19:10:45.341383 7fd4e2ea8700 0 filestore(/var/lib/ceph/osd/ceph-299) error (39) Directory not empty not handled on operation 0x12c88178 (6494115.0.1, or op 1, counting from 0) -2> 2016-02-05 19:10:45.341477 7fd4e2ea8700 0 filestore(/var/lib/ceph/osd/ceph-299) ENOTEMPTY suggests garbage data in osd data dir -1> 2016-02-05 19:10:45.341493 7fd4e2ea8700 0 filestore(/var/lib/ceph/osd/ceph-299) transaction dump: { "ops": [ { "op_num": 0, "op_name": "remove", "collection": "70.532s3_head", "oid": "532\/\/head\/\/70\/18446744073709551615\/3" }, { "op_num": 1, "op_name": "rmcoll", "collection": "70.532s3_head" } ] } 0> 2016-02-05 19:10:45.343794 7fd4e2ea8700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadP ool::TPHandle*)' thread 7fd4e2ea8700 time 2016-02-05 19:10:45.341673 os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error") ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb] 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12] 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4] 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 6: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 7: (()+0x8182) [0x7fd4ef916182] 8: (clone()+0x6d) [0x7fd4ede8147d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.299.log --- end dump of recent events --- 2016-02-05 19:10:45.441428 7fd4e2ea8700 -1 *** Caught signal (Aborted) ** in thread 7fd4e2ea8700 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: /usr/bin/ceph-osd() [0xacd7ba] 2: (()+0x10340) [0x7fd4ef91e340] 3: (gsignal()+0x39) [0x7fd4eddbdcc9] 4: (abort()+0x148) [0x7fd4eddc10d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fd4ee6c8535] 6: (()+0x5e6d6) [0x7fd4ee6c66d6] 7: (()+0x5e703) [0x7fd4ee6c6703] 8: (()+0x5e922) [0x7fd4ee6c6922] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8] 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12] 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4 ] 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a] 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 15: (()+0x8182) [0x7fd4ef916182] 16: (clone()+0x6d) [0x7fd4ede8147d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -4> 2016-02-05 19:10:45.355813 7fd4d58e4700 1 -- 10.31.0.3:6835/157558 <== osd.1 10.31.0.101:0/197780 23431 ==== osd_ping(ping e144138 stamp 2016-02-05 19:10:45.3440 20) v2 ==== 47+0+0 (1893056775 0 0) 0x36782a00 con 0x2c6c8580 -3> 2016-02-05 19:10:45.355853 7fd4d58e4700 1 -- 10.31.0.3:6835/157558 --> 10.31.0.101:0/197780 -- osd_ping(ping_reply e144138 stamp 2016-02-05 19:10:45.344020) v2 - - ?+0 0x29702800 con 0x2c6c8580 -2> 2016-02-05 19:10:45.356076 7fd4d70e7700 1 -- 10.31.0.67:6835/157558 <== osd.1 10.31.0.101:0/197780 23431 ==== osd_ping(ping e144138 stamp 2016-02-05 19:10:45.344 020) v2 ==== 47+0+0 (1893056775 0 0) 0x2cf84200 con 0x2bc9c260 -1> 2016-02-05 19:10:45.356627 7fd4d70e7700 1 -- 10.31.0.67:6835/157558 --> 10.31.0.101:0/197780 -- osd_ping(ping_reply e144138 stamp 2016-02-05 19:10:45.344020) v2 -- ?+0 0x2f5cae00 con 0x2bc9c260 0> 2016-02-05 19:10:45.441428 7fd4e2ea8700 -1 *** Caught signal (Aborted) ** in thread 7fd4e2ea8700 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: /usr/bin/ceph-osd() [0xacd7ba] 2: (()+0x10340) [0x7fd4ef91e340] 3: (gsignal()+0x39) [0x7fd4eddbdcc9] 4: (abort()+0x148) [0x7fd4eddc10d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fd4ee6c8535] 6: (()+0x5e6d6) [0x7fd4ee6c66d6] 7: (()+0x5e703) [0x7fd4ee6c6703] 8: (()+0x5e922) [0x7fd4ee6c6922] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8] 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12] 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4 ] 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x92a52a] 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 15: (()+0x8182) [0x7fd4ef916182] 16: (clone()+0x6d) [0x7fd4ede8147d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.299.log ------------------------- This log is similar on other OSDs, would this be the best procedure to repair the OSDs: http://tracker.ceph.com/issues/12428 ? Thanks, Jeff -- Jeffrey McDonald, PhD Assistant Director for HPC Operations Minnesota Supercomputing Institute University of Minnesota Twin Cities 599 Walter Library email: jeffrey.mcdon...@msi.umn.edu 117 Pleasant St SE phone: +1 612 625-6905 Minneapolis, MN 55455 fax: +1 612 624-8861
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com