http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
:) On Friday, November 6, 2015, Philipp Schwaha <phil...@schwaha.net> wrote: > Hi, > > I have an issue with my (small) ceph cluster after an osd failed. > ceph -s reports the following: > cluster 2752438a-a33e-4df4-b9ec-beae32d00aad > health HEALTH_WARN > 31 pgs down > 31 pgs peering > 31 pgs stuck inactive > 31 pgs stuck unclean > monmap e1: 1 mons at {0=192.168.19.13:6789/0} > election epoch 1, quorum 0 0 > osdmap e138: 3 osds: 2 up, 2 in > pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects > 1290 GB used, 8021 GB / 9315 GB avail > 33 active+clean > 31 down+peering > > I am now unable to map the rbd image; the command will just time out. > The log is at the end of the message. > > Is there a way to recover the osd / the ceph cluster from this? > > thanks in advance > Philipp > > > > -2> 2015-10-30 01:04:59.689116 7f4bb741e700 1 heartbeat_map > is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15 > -1> 2015-10-30 01:04:59.689140 7f4bb741e700 1 heartbeat_map > is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out > after 150 > 0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1 > common/HeartbeatMap.cc: In function 'bool > ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, > time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176 > common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x77) [0xb12457] > 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x119) [0xa47179] > 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] > 4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] > 5: (CephContextServiceThread::entry()+0x164) [0xb21974] > 6: (()+0x76f5) [0x7f4bbdb0c6f5] > 7: (__clone()+0x6d) [0x7f4bbc09cedd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_replay > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.2.log > --- end dump of recent events --- > 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) ** > in thread 7f4bb741e700 > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > 1: /usr/bin/ceph-osd() [0xa11c84] > 2: (()+0x10690) [0x7f4bbdb15690] > 3: (gsignal()+0x37) [0x7f4bbbfe63c7] > 4: (abort()+0x16a) [0x7f4bbbfe77fa] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] > 6: (()+0x5dda7) [0x7f4bbc8c5da7] > 7: (()+0x5ddf2) [0x7f4bbc8c5df2] > 8: (()+0x5e008) [0x7f4bbc8c6008] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x252) [0xb12632] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x119) [0xa47179] > 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] > 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] > 13: (CephContextServiceThread::entry()+0x164) [0xb21974] > 14: (()+0x76f5) [0x7f4bbdb0c6f5] > 15: (__clone()+0x6d) [0x7f4bbc09cedd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- begin dump of recent events --- > 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal > (Aborted) ** > in thread 7f4bb741e700 > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > 1: /usr/bin/ceph-osd() [0xa11c84] > 2: (()+0x10690) [0x7f4bbdb15690] > 3: (gsignal()+0x37) [0x7f4bbbfe63c7] > 4: (abort()+0x16a) [0x7f4bbbfe77fa] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] > 6: (()+0x5dda7) [0x7f4bbc8c5da7] > 7: (()+0x5ddf2) [0x7f4bbc8c5df2] > 8: (()+0x5e008) [0x7f4bbc8c6008] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x252) [0xb12632] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x119) [0xa4 > 7179] > 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] > 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] > 13: (CephContextServiceThread::entry()+0x164) [0xb21974] > 14: (()+0x76f5) [0x7f4bbdb0c6f5] > 15: (__clone()+0x6d) [0x7f4bbc09cedd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this > . > > --- begin dump of recent events --- > 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal > (Aborted) ** > in thread 7f4bb741e700 > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > 1: /usr/bin/ceph-osd() [0xa11c84] > 2: (()+0x10690) [0x7f4bbdb15690] > 3: (gsignal()+0x37) [0x7f4bbbfe63c7] > 4: (abort()+0x16a) [0x7f4bbbfe77fa] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] > 6: (()+0x5dda7) [0x7f4bbc8c5da7] > 7: (()+0x5ddf2) [0x7f4bbc8c5df2] > 8: (()+0x5e008) [0x7f4bbc8c6008] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x252) [0xb12632] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x119) [0xa4 > 7179] > 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] > 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] > 13: (CephContextServiceThread::entry()+0x164) [0xb21974] > 14: (()+0x76f5) [0x7f4bbdb0c6f5] > 15: (__clone()+0x6d) [0x7f4bbc09cedd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this > . > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_replay > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.2.log > --- end dump of recent events --- > 2015-10-30 01:07:00.920675 7f0ed0d067c0 0 ceph version 0.94.3 > (95cefea9fd9ab740263bf8bb479 > 6fd864d9afe2b), process ceph-osd, pid 14210 > 2015-10-30 01:07:01.096259 7f0ed0d067c0 0 > filestore(/var/lib/ceph/osd/ceph-2) backend btrf > s (magic 0x9123683e) > 2015-10-30 01:07:01.099472 7f0ed0d067c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2 > ) detect_features: FIEMAP ioctl is supported and appears to work > 2015-10-30 01:07:01.099511 7f0ed0d067c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2 > ) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' > config option > 2015-10-30 01:07:02.681342 7f0ed0d067c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2 > ) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) > 2015-10-30 01:07:02.682285 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) > detect_feature: CLONE_RANGE ioctl is supported > 2015-10-30 01:07:04.508905 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.2.log > --- end dump of recent events --- > 2015-10-30 01:07:00.920675 7f0ed0d067c0 0 ceph version 0.94.3 > (95cefea9fd9ab740263bf8bb479 > 6fd864d9afe2b), process ceph-osd, pid 14210 > 2015-10-30 01:07:01.096259 7f0ed0d067c0 0 > filestore(/var/lib/ceph/osd/ceph-2) backend btrf > s (magic 0x9123683e) > 2015-10-30 01:07:01.099472 7f0ed0d067c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2 > ) detect_features: FIEMAP ioctl is supported and appears to work > 2015-10-30 01:07:01.099511 7f0ed0d067c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2 > ) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' > config option > 2015-10-30 01:07:02.681342 7f0ed0d067c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2 > ) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) > 2015-10-30 01:07:02.682285 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) > detect_feature: CLONE_RANGE ioctl is supported > 2015-10-30 01:07:04.508905 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) > detect_feature: SNAP_CREATE is supported > 2015-10-30 01:07:04.509418 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) > detect_feature: SNAP_DESTROY is supported > 2015-10-30 01:07:04.518728 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > START_SYNC is supported (transid 8343) > 2015-10-30 01:07:05.524109 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > WAIT_SYNC is supported > 2015-10-30 01:07:05.705014 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > SNAP_CREATE_V2 is supported > 2015-10-30 01:07:06.051275 7f0ed0d067c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: error > removing old current subvol: (1) Operation not permitted > 2015-10-30 01:07:07.655679 7f0ed0d067c0 -1 > filestore(/var/lib/ceph/osd/ceph-2) mount initial op seq is 0; something > is wrong > 2015-10-30 01:07:07.655801 7f0ed0d067c0 -1 osd.2 0 OSD:init: unable to > mount object store > 2015-10-30 01:07:07.655821 7f0ed0d067c0 -1 ESC[0;31m ** ERROR: osd init > failed: (22) Invalid argumentESC[0m > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <javascript:;> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com