http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/

:)

On Friday, November 6, 2015, Philipp Schwaha <phil...@schwaha.net> wrote:

> Hi,
>
> I have an issue with my (small) ceph cluster after an osd failed.
> ceph -s reports the following:
>     cluster 2752438a-a33e-4df4-b9ec-beae32d00aad
>      health HEALTH_WARN
>             31 pgs down
>             31 pgs peering
>             31 pgs stuck inactive
>             31 pgs stuck unclean
>      monmap e1: 1 mons at {0=192.168.19.13:6789/0}
>             election epoch 1, quorum 0 0
>      osdmap e138: 3 osds: 2 up, 2 in
>       pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects
>             1290 GB used, 8021 GB / 9315 GB avail
>                   33 active+clean
>                   31 down+peering
>
> I am now unable to map the rbd image; the command will just time out.
> The log is at the end of the message.
>
> Is there a way to recover the osd / the ceph cluster from this?
>
> thanks in advance
>         Philipp
>
>
>
>     -2> 2015-10-30 01:04:59.689116 7f4bb741e700  1 heartbeat_map
> is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15
>     -1> 2015-10-30 01:04:59.689140 7f4bb741e700  1 heartbeat_map
> is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out
> after 150
>      0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1
> common/HeartbeatMap.cc: In function 'bool
> ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
> time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x77) [0xb12457]
>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x119) [0xa47179]
>  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
>  4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
>  5: (CephContextServiceThread::entry()+0x164) [0xb21974]
>  6: (()+0x76f5) [0x7f4bbdb0c6f5]
>  7: (__clone()+0x6d) [0x7f4bbc09cedd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) **
>  in thread 7f4bb741e700
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: /usr/bin/ceph-osd() [0xa11c84]
>  2: (()+0x10690) [0x7f4bbdb15690]
>  3: (gsignal()+0x37) [0x7f4bbbfe63c7]
>  4: (abort()+0x16a) [0x7f4bbbfe77fa]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
>  6: (()+0x5dda7) [0x7f4bbc8c5da7]
>  7: (()+0x5ddf2) [0x7f4bbc8c5df2]
>  8: (()+0x5e008) [0x7f4bbc8c6008]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x252) [0xb12632]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x119) [0xa47179]
>  11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
>  13: (CephContextServiceThread::entry()+0x164) [0xb21974]
>  14: (()+0x76f5) [0x7f4bbdb0c6f5]
>  15: (__clone()+0x6d) [0x7f4bbc09cedd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- begin dump of recent events ---
>      0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f4bb741e700
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: /usr/bin/ceph-osd() [0xa11c84]
>  2: (()+0x10690) [0x7f4bbdb15690]
>  3: (gsignal()+0x37) [0x7f4bbbfe63c7]
>  4: (abort()+0x16a) [0x7f4bbbfe77fa]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
>  6: (()+0x5dda7) [0x7f4bbc8c5da7]
>  7: (()+0x5ddf2) [0x7f4bbc8c5df2]
>  8: (()+0x5e008) [0x7f4bbc8c6008]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x252) [0xb12632]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x119) [0xa4
> 7179]
>  11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
>  13: (CephContextServiceThread::entry()+0x164) [0xb21974]
>  14: (()+0x76f5) [0x7f4bbdb0c6f5]
>  15: (__clone()+0x6d) [0x7f4bbc09cedd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this
> .
>
> --- begin dump of recent events ---
>      0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f4bb741e700
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: /usr/bin/ceph-osd() [0xa11c84]
>  2: (()+0x10690) [0x7f4bbdb15690]
>  3: (gsignal()+0x37) [0x7f4bbbfe63c7]
>  4: (abort()+0x16a) [0x7f4bbbfe77fa]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
>  6: (()+0x5dda7) [0x7f4bbc8c5da7]
>  7: (()+0x5ddf2) [0x7f4bbc8c5df2]
>  8: (()+0x5e008) [0x7f4bbc8c6008]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x252) [0xb12632]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x119) [0xa4
> 7179]
>  11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
>  13: (CephContextServiceThread::entry()+0x164) [0xb21974]
>  14: (()+0x76f5) [0x7f4bbdb0c6f5]
>  15: (__clone()+0x6d) [0x7f4bbc09cedd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this
> .
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
> 2015-10-30 01:07:00.920675 7f0ed0d067c0  0 ceph version 0.94.3
> (95cefea9fd9ab740263bf8bb479
> 6fd864d9afe2b), process ceph-osd, pid 14210
> 2015-10-30 01:07:01.096259 7f0ed0d067c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend btrf
> s (magic 0x9123683e)
> 2015-10-30 01:07:01.099472 7f0ed0d067c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2
> ) detect_features: FIEMAP ioctl is supported and appears to work
> 2015-10-30 01:07:01.099511 7f0ed0d067c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2
> ) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap'
> config option
> 2015-10-30 01:07:02.681342 7f0ed0d067c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2
> ) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
> 2015-10-30 01:07:02.682285 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
> detect_feature: CLONE_RANGE ioctl is supported
> 2015-10-30 01:07:04.508905 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
> 2015-10-30 01:07:00.920675 7f0ed0d067c0  0 ceph version 0.94.3
> (95cefea9fd9ab740263bf8bb479
> 6fd864d9afe2b), process ceph-osd, pid 14210
> 2015-10-30 01:07:01.096259 7f0ed0d067c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend btrf
> s (magic 0x9123683e)
> 2015-10-30 01:07:01.099472 7f0ed0d067c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2
> ) detect_features: FIEMAP ioctl is supported and appears to work
> 2015-10-30 01:07:01.099511 7f0ed0d067c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2
> ) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap'
> config option
> 2015-10-30 01:07:02.681342 7f0ed0d067c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2
> ) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
> 2015-10-30 01:07:02.682285 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
> detect_feature: CLONE_RANGE ioctl is supported
> 2015-10-30 01:07:04.508905 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
> detect_feature: SNAP_CREATE is supported
> 2015-10-30 01:07:04.509418 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2)
> detect_feature: SNAP_DESTROY is supported
> 2015-10-30 01:07:04.518728 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> START_SYNC is supported (transid 8343)
> 2015-10-30 01:07:05.524109 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> WAIT_SYNC is supported
> 2015-10-30 01:07:05.705014 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> SNAP_CREATE_V2 is supported
> 2015-10-30 01:07:06.051275 7f0ed0d067c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: error
> removing old current subvol: (1) Operation not permitted
> 2015-10-30 01:07:07.655679 7f0ed0d067c0 -1
> filestore(/var/lib/ceph/osd/ceph-2) mount initial op seq is 0; something
> is wrong
> 2015-10-30 01:07:07.655801 7f0ed0d067c0 -1 osd.2 0 OSD:init: unable to
> mount object store
> 2015-10-30 01:07:07.655821 7f0ed0d067c0 -1 ESC[0;31m ** ERROR: osd init
> failed: (22) Invalid argumentESC[0m
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <javascript:;>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to