The 5 OSDs that are down have all been kicked out for being
unresponsive. The 5 OSDs are getting kicked faster than they can
complete the recovery+backfill. The number of degraded PGs is growing
over time.
root@ceph0c:~# ceph -w
cluster 1604ec7a-6ceb-42fc-8c68-0a7896c4e120
health HEALTH_WARN 49 pgs backfill; 926 pgs degraded; 252 pgs
down; 30 pgs incomplete; 291 pgs peering; 1 pgs recovery_wait; 175 pgs
stale; 255 pgs stuck inactive; 175 pgs stuck stale; 1234 pgs stuck
unclean; 66 requests are blocked > 32 sec; recovery 6820014/38055556
objects degraded (17.921%); 4/16 in osds are down; noout flag(s) set
monmap e2: 2 mons at
{ceph0c=10.193.0.6:6789/0,ceph1c=10.193.0.7:6789/0}, election epoch 238,
quorum 0,1 ceph0c,ceph1c
osdmap e38673: 16 osds: 12 up, 16 in
flags noout
pgmap v7325233: 2560 pgs, 17 pools, 14090 GB data, 18581 kobjects
28456 GB used, 31132 GB / 59588 GB avail
6820014/38055556 objects degraded (17.921%)
1 stale+active+clean+scrubbing+deep
15 active
1247 active+clean
1 active+recovery_wait
45 stale+active+clean
39 peering
29 stale+active+degraded+wait_backfill
252 down+peering
827 active+degraded
50 stale+active+degraded
20 stale+active+degraded+remapped+wait_backfill
30 stale+incomplete
4 active+clean+scrubbing+deep
Here's a snippet of ceph.log for one of these OSDs:
2014-05-07 09:22:46.747036 mon.0 10.193.0.6:6789/0 39981 : [INF] osd.3
marked down after no pg stats for 901.212859seconds
2014-05-07 09:47:17.930251 mon.0 10.193.0.6:6789/0 40561 : [INF] osd.3
10.193.0.6:6812/2830 boot
2014-05-07 09:47:16.914519 osd.3 10.193.0.6:6812/2830 823 : [WRN] map
e38649 wrongly marked me down
root@ceph0c:~# uname -a
Linux ceph0c 3.5.0-46-generic #70~precise1-Ubuntu SMP Thu Jan 9 23:55:12
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@ceph0c:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.4 LTS
Release: 12.04
Codename: precise
root@ceph0c:~# ceph -v
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
Any ideas what I can do to make these OSDs stop drying after 15 minutes?
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com