Hi,

I am running a 3-node Deis cluster with ceph as underlying FS. So it is ceph running inside Docker containers running in three separate servers. I rebooted all three nodes (almost at once). After rebooted, the ceph monitor refuse to connect to each other.

Symptoms are:
- no quorum formed,
- ceph admin socket file does not exist
- only the following in ceph log:

Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700 0 -- :/1000021 >> 10.132.183.191:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f5ce4029930).fault
Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700 0 -- :/1000021 >> 10.132.183.192:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f5ce4029930).fault
Dec 14 16:38:50 deis-1 sh[933]: 2014-12-14 08:38:50.267398 7f5cec71f700 0 -- :/1000021 >> 10.132.183.190:6789/0 pipe(0x7f5cd40030e0 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f5cd4003370).fault
...keep repeating...

This is *my /etc/ceph/ceph.conf file*:
[global]
fsid = cc368515-9dc6-48e2-9526-58ac4cbb3ec9
mon initial members = deis-3
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = 3
osd pool default min_size = 1
osd pool default pg_num = 128
osd pool default pgp_num = 128
osd recovery delay start = 15
log file = /dev/stdout

[mon.deis-3]
host = deis-3
mon addr = 10.132.183.190:6789

[mon.deis-1]
host = deis-1
mon addr = 10.132.183.191:6789

[mon.deis-2]
host = deis-2
mon addr = 10.132.183.192:6789

[client.radosgw.gateway]
host = deis-store-gateway
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /dev/stdout

*IP table of the docker host:**
*core@deis-3 ~ $ sudo iptables --list
Chain INPUT (policy DROP)
target     prot opt source destination
Firewall-INPUT  all  --  anywhere anywhere

Chain FORWARD (policy DROP)
target     prot opt source destination
ACCEPT     tcp  --  anywhere 172.17.0.2           tcp dpt:http
ACCEPT     tcp  --  anywhere 172.17.0.2           tcp dpt:https
ACCEPT     tcp  --  anywhere 172.17.0.2           tcp dpt:2222
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere anywhere
ACCEPT     all  --  anywhere anywhere
Firewall-INPUT  all  --  anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source destination

Chain Firewall-INPUT (2 references)
target     prot opt source destination
ACCEPT     all  --  anywhere anywhere
ACCEPT     icmp --  anywhere anywhere             icmp echo-reply
ACCEPT icmp -- anywhere anywhere icmp destination-unreachable
ACCEPT     icmp --  anywhere anywhere             icmp time-exceeded
ACCEPT     icmp --  anywhere anywhere             icmp echo-request
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.132.183.190 anywhere
ACCEPT     all  --  10.132.183.192 anywhere
ACCEPT     all  --  10.132.183.191 anywhere
ACCEPT     all  --  anywhere anywhere
ACCEPT tcp -- anywhere anywhere ctstate NEW multiport dports ssh,2222,http,https
LOG        all  --  anywhere anywhere             LOG level warning
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited


All private IPs are ping-gable within the ceph monitor container. What could I do next to troubleshoot this issue?

Thanks a lot!

- Jimmy Chu
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to