Hi Chris, according to your ceph osd tree capture, although the OSD reweight is set to 1, the OSD CRUSH weight is set to 0 (2nd column). You need to assign the OSD a CRUSH weight so that it can be selected by CRUSH: ceph osd crush reweight osd.30 x.y (where 1.0=1TB)
Only when this is done will you see if it joins. JC > On 2 Apr 2015, at 19:36, Chris Kitzmiller <ckitzmil...@hampshire.edu> wrote: > > I have a cluster running 0.80.9 on Ubuntu 14.04. A couple nights ago I lost > two disks from a pool with size=2. :( > > I replaced the two failed OSDs and I now have two PGs which are marked as > incomplete in an otherwise healthy cluster. Following this page ( > https://ceph.com/community/incomplete-pgs-oh-my/ > <https://ceph.com/community/incomplete-pgs-oh-my/> ) I was able to set up > another node and install Giant 0.87.1, mount one of my failed OSD drives and > successfully export the two PGs. I set up another OSD on my new node, > weighted it to zero, and imported the two PGs. > > I'm still stuck though. It seems as though the new OSD just doesn't want to > share with the other OSDs. Is there any way for me to ask an OSD which PGs it > has (rather than ask the MON which OSDs a PG is on) to verify that my import > was good? Help! > > 0 and 15 were the OSDs I lost. 30 is the new OSD. I've currently got size = > 2, min_size = 1. > > root@storage1:~# ceph pg dump | grep incomplete | column -t > dumped all in format plain > 3.102 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.529594 0'0 > 15730:21 [0,15] 0 [0,15] 0 13985'53107 2015-03-29 21:17:15.568125 > 13985'49195 2015-03-24 18:38:08.244769 > 3.c7 0 0 0 0 0 0 0 incomplete 2015-04-02 20:49:32.968841 0'0 > 15730:17 [15,0] 15 [15,0] 15 13985'54076 2015-03-31 19:14:22.721695 > 13985'54076 2015-03-31 19:14:22.721695 > > root@storage1:~# ceph health detail > HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean; 1 > requests are blocked > 32 sec; 1 osds have slow requests > pg 3.c7 is stuck inactive since forever, current state incomplete, last > acting [15,0] > pg 3.102 is stuck inactive since forever, current state incomplete, last > acting [0,15] > pg 3.c7 is stuck unclean since forever, current state incomplete, last acting > [15,0] > pg 3.102 is stuck unclean since forever, current state incomplete, last > acting [0,15] > pg 3.102 is incomplete, acting [0,15] > pg 3.c7 is incomplete, acting [15,0] > 1 ops are blocked > 8388.61 sec > 1 ops are blocked > 8388.61 sec on osd.15 > 1 osds have slow requests > > root@storage1:~# ceph osd tree > # id weight type name up/down reweight > -1 81.65 root default > -2 81.65 host storage1 > -3 13.63 journal storage1-journal1 > 1 2.72 osd.1 up 1 > 4 2.72 osd.4 up 1 > 2 2.73 osd.2 up 1 > 3 2.73 osd.3 up 1 > 0 2.73 osd.0 up 1 > -4 13.61 journal storage1-journal2 > 5 2.72 osd.5 up 1 > 6 2.72 osd.6 up 1 > 8 2.72 osd.8 up 1 > 9 2.72 osd.9 up 1 > 7 2.73 osd.7 up 1 > -5 13.6 journal storage1-journal3 > 11 2.72 osd.11 up 1 > 12 2.72 osd.12 up 1 > 13 2.72 osd.13 up 1 > 14 2.72 osd.14 up 1 > 10 2.72 osd.10 up 1 > -6 13.61 journal storage1-journal4 > 16 2.72 osd.16 up 1 > 17 2.72 osd.17 up 1 > 18 2.72 osd.18 up 1 > 19 2.72 osd.19 up 1 > 15 2.73 osd.15 up 1 > -7 13.6 journal storage1-journal5 > 20 2.72 osd.20 up 1 > 21 2.72 osd.21 up 1 > 22 2.72 osd.22 up 1 > 23 2.72 osd.23 up 1 > 24 2.72 osd.24 up 1 > -8 13.6 journal storage1-journal6 > 25 2.72 osd.25 up 1 > 26 2.72 osd.26 up 1 > 27 2.72 osd.27 up 1 > 28 2.72 osd.28 up 1 > 29 2.72 osd.29 up 1 > -9 0 host ithome > 30 0 osd.30 up 1 > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com