The 5 OSDs that are down have all been kicked out for being unresponsive. The 5 OSDs are getting kicked faster than they can complete the recovery+backfill. The number of degraded PGs is growing over time.

root@ceph0c:~# ceph -w
    cluster 1604ec7a-6ceb-42fc-8c68-0a7896c4e120
health HEALTH_WARN 49 pgs backfill; 926 pgs degraded; 252 pgs down; 30 pgs incomplete; 291 pgs peering; 1 pgs recovery_wait; 175 pgs stale; 255 pgs stuck inactive; 175 pgs stuck stale; 1234 pgs stuck unclean; 66 requests are blocked > 32 sec; recovery 6820014/38055556 objects degraded (17.921%); 4/16 in osds are down; noout flag(s) set monmap e2: 2 mons at {ceph0c=10.193.0.6:6789/0,ceph1c=10.193.0.7:6789/0}, election epoch 238, quorum 0,1 ceph0c,ceph1c
     osdmap e38673: 16 osds: 12 up, 16 in
            flags noout
      pgmap v7325233: 2560 pgs, 17 pools, 14090 GB data, 18581 kobjects
            28456 GB used, 31132 GB / 59588 GB avail
            6820014/38055556 objects degraded (17.921%)
                   1 stale+active+clean+scrubbing+deep
                  15 active
                1247 active+clean
                   1 active+recovery_wait
                  45 stale+active+clean
                  39 peering
                  29 stale+active+degraded+wait_backfill
                 252 down+peering
                 827 active+degraded
                  50 stale+active+degraded
                  20 stale+active+degraded+remapped+wait_backfill
                  30 stale+incomplete
                   4 active+clean+scrubbing+deep

Here's a snippet of ceph.log for one of these OSDs:
2014-05-07 09:22:46.747036 mon.0 10.193.0.6:6789/0 39981 : [INF] osd.3 marked down after no pg stats for 901.212859seconds 2014-05-07 09:47:17.930251 mon.0 10.193.0.6:6789/0 40561 : [INF] osd.3 10.193.0.6:6812/2830 boot 2014-05-07 09:47:16.914519 osd.3 10.193.0.6:6812/2830 823 : [WRN] map e38649 wrongly marked me down

root@ceph0c:~# uname -a
Linux ceph0c 3.5.0-46-generic #70~precise1-Ubuntu SMP Thu Jan 9 23:55:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@ceph0c:~# lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 12.04.4 LTS
Release:    12.04
Codename:    precise
root@ceph0c:~# ceph -v
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)


Any ideas what I can do to make these OSDs stop drying after 15 minutes?




--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to