Hi Karan,

I resolved it the same way you did. We had a network partition that caused
the MON to die, it appears.

I'm running 0.72.1

It would be nice if redeploying wasn't the solution, but if it's simply
cleaner to do so, then I will continue along that route.

I think what's more troubling is that when this occurred we lost all
connectivity to the Ceph cluster.


On Wed, Feb 5, 2014 at 1:11 AM, Karan Singh <ksi...@csc.fi> wrote:

> Hi Greg
>
>
> I have seen this problem before in my cluster.
>
>
>
>    - What ceph version you are running
>    - Did you made any change recently in the cluster , that resulted in
>    this problem
>
>
> You identified correct , the only problem is ceph-mon-2003  is listening
> to incorrect port , it should listen on port 6789 ( like the other two
> monitors ) . How i resolved is cleanly removing the infected monitor node
> and adding it back to cluster.
>
>
> Regards
>
> Karan
>
> ------------------------------
> *From: *"Greg Poirier" <greg.poir...@opower.com>
> *To: *ceph-users@lists.ceph.com
> *Sent: *Tuesday, 4 February, 2014 10:50:21 PM
> *Subject: *[ceph-users] Ceph MON can no longer join quorum
>
>
> I have a MON that at some point lost connectivity to the rest of the
> cluster and now cannot rejoin.
>
> Each time I restart it, it looks like it's attempting to create a new MON
> and join the cluster, but the rest of the cluster rejects it, because the
> new one isn't in the monmap.
>
> I don't know why it suddenly decided it needed to be a new MON.
>
> I am not really sure where to start.
>
> root@ceph-mon-2003:/var/log/ceph# ceph -s
>     cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8
>      health HEALTH_ERR 1 pgs inconsistent; 2 pgs peering; 126 pgs stale; 2
> pgs stuck inactive; 126 pgs stuck stale; 2 pgs stuck unclean; 10 requests
> are blocked > 32 sec; 1 scrub errors; 1 mons down, quorum 0,1
> ceph-mon-2001,ceph-mon-2002
>      monmap e2: 3 mons at {ceph-mon-2001=
> 10.30.66.13:6789/0,ceph-mon-2002=10.30.66.14:6789/0,ceph-mon-2003=10.30.66.15:6800/0},
> election epoch 12964, quorum 0,1 ceph-mon-2001,ceph-mon-2002
>
> Notice ceph-mon-2003:6800
>
> If I try to start ceph-mon-all, it will be listening on some other port...
>
> root@ceph-mon-2003:/var/log/ceph# start ceph-mon-all
> ceph-mon-all start/running
> root@ceph-mon-2003:/var/log/ceph# ps -ef | grep ceph
> root      6930     1 31 15:49 ?        00:00:00 /usr/bin/ceph-mon
> --cluster=ceph -i ceph-mon-2003 -f
> root      6931     1  3 15:49 ?        00:00:00 python
> /usr/sbin/ceph-create-keys --cluster=ceph -i ceph-mon-2003
>
> root@ceph-mon-2003:/var/log/ceph# ceph -s
> 2014-02-04 15:49:56.854866 7f9cf422d700  0 -- :/1007028 >>
> 10.30.66.15:6789/0 pipe(0x7f9cf0021370 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f9cf00215d0).fault
>     cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8
>      health HEALTH_ERR 1 pgs inconsistent; 2 pgs peering; 126 pgs stale; 2
> pgs stuck inactive; 126 pgs stuck stale; 2 pgs stuck unclean; 10 requests
> are blocked > 32 sec; 1 scrub errors; 1 mons down, quorum 0,1
> ceph-mon-2001,ceph-mon-2002
>      monmap e2: 3 mons at {ceph-mon-2001=
> 10.30.66.13:6789/0,ceph-mon-2002=10.30.66.14:6789/0,ceph-mon-2003=10.30.66.15:6800/0},
> election epoch 12964, quorum 0,1 ceph-mon-2001,ceph-mon-2002
>
> Suggestions?
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to