[ceph-users] Rebuilding Cluster from complete MON failure with existing OSDs

Dan Geist Wed, 07 Jan 2015 12:24:13 -0800

Hi, I have a situation where I moved the interfaces over which my ceph-public 
network is connected (only the interfaces, not the IPs, etc.) this was done to 
increase available bandwidth, but it backfired catastrophically. My monitors 
all failed and somehow became corrupted, but I was unable to repair them. So I 
rebuild the monitors in hope that I could add the existing OSDs back in and 
recover the cluster.


There are three hosts. Each has a monitor and 6 osds. Each osd is a spinning 
disk partition with a journal located on a SSD partition on the same host. From 
what I can tell, all the data on the osd disks is intact, but even after (what 
I think was) adding all the OSDs back into the crushmap, etc. the cluster 
doesn't seem like it is "seeing" the partitions and I'm at a loss for how to 
troubleshoot it further.

Hosts are all Ubutunu trusty running 0.80.7 ceph packages.

dgeist# ceph -s
    cluster ac486394-802a-49d3-a92c-a103268ea189
     health HEALTH_WARN 4288 pgs stuck inactive; 4288 pgs stuck unclean; 18/18 
in osds are down
     monmap e1: 3 mons at 
{hypd01=10.100.100.11:6789/0,hypd02=10.100.100.12:6789/0,hypd03=10.100.100.13:6789/0},
 election epoch 40, quorum 0,1,2 hypd01,hypd02,hypd03
     osdmap e65: 18 osds: 0 up, 18 in
      pgmap v66: 4288 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                4288 creating

dgeist# ceph osd tree
# id    weight  type name       up/down reweight
-1      18      root default
-2      6               host hypd01
0       1                       osd.0   down    1       
1       1                       osd.1   down    1       
2       1                       osd.2   down    1       
3       1                       osd.3   down    1       
4       1                       osd.4   down    1       
5       1                       osd.5   down    1       
-3      6               host hypd02
6       1                       osd.6   down    1       
7       1                       osd.7   down    1       
8       1                       osd.8   down    1       
9       1                       osd.9   down    1       
10      1                       osd.10  down    1       
11      1                       osd.11  down    1       
-4      6               host hypd03
12      1                       osd.12  down    1       
13      1                       osd.13  down    1       
14      1                       osd.14  down    1       
15      1                       osd.15  down    1       
16      1                       osd.16  down    1       
17      1                       osd.17  down    1


Thanks in advance for any thoughts on how to recover this.

Dan

Dan Geist dan(@)polter.net
(33.942973, -84.312472)
http://www.polter.net


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Rebuilding Cluster from complete MON failure with existing OSDs

Reply via email to