On Wednesday, March 13, 2013 at 5:52 AM, Ansgar Jazdzewski wrote:
> hi,
>  
> i added 10 new OSD's to my cluster, after the growth is done, i got:
>  
> ##########
> # ceph -s
> health HEALTH_WARN 217 pgs stuck unclean
> monmap e4: 2 mons at {a=10.100.217.3:6789/0,b=10.100.217.4:6789/0 
> (http://10.100.217.3:6789/0,b=10.100.217.4:6789/0)}, election epoch 4, quorum 
> 0,1 a,b
> osdmap e1480: 14 osds: 14 up, 14 in
> pgmap v8690731: 776 pgs: 559 active+clean, 217 active+remapped; 341 GB data, 
> 685 GB used, 15390 GB / 16075 GB avail
> mdsmap e312: 1/1/1 up {0=d=up:active}, 3 up:standby
> ##########
>  
> during the growth some vm was online, with rbd! is that the reason for the 
> warning?
Nope, it's not because you were using the cluster. The "unclean" PGs here are 
those which are in the "active+remapped" state. That's actually two states — 
"active" which is good, because it means they're serving reads and writes"; 
"remapped" which means that for some reason the current set of OSDs handling 
them isn't the set that CRUSH thinks should be handling them. Given your 
cluster expansion that probably means that your CRUSH map and rules aren't 
behaving themselves and are failing to assign the right number of replicas to 
those PGs. You can check this by looking at the PG dump. If you search for 
"ceph active remapped" it looks to me like you'll get some useful results; you 
might also just be able to enable the CRUSH tunables 
(http://ceph.com/docs/master/rados/operations/crush-map/#tunables).

John, this is becoming a more common problem; we should generate some more 
targeted documentation around it. :)

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to