I see, maybe you want to look at these instructions. I don’t know if you are running Rook, but the point about getting the container alive by using `sleep` is important. Then you can get into the container with `exec` and do what you need to.
https://rook.io/docs/rook/v1.4/ceph-disaster-recovery.html#restoring-mon-quorum > On Oct 12, 2020, at 4:16 PM, Gaël THEROND <gael.ther...@bitswalk.com> wrote: > > Hi Brian! > > Thanks a lot for your quick answer, it was fast ! > > Yes, I’ve read this doc, yet I can’t perform appropriate commands as my OSDs > are up and running. > > As my mon is a container if I try to use ceph-mon —extract it won’t work as > the mon process is running and if I stop it the container will be restarted > and I’ll be ousted off it. > > I can’t retrieve anything from ceph mon getmap as the quorum isn’t forming. > > Yep, I know that I would need three nodes and I have a third node available > since recently for this lab. > > unfortunately it’s a lab cluster and so one of my colleagues just took the > third node for testing purpose... I told you, a series of unfortunate events > :-) > > I can’t get rid of the cluster as I can’t lost OSDs data. > > G. > > Le mar. 13 oct. 2020 à 00:01, Brian Topping <brian.topp...@gmail.com > <mailto:brian.topp...@gmail.com>> a écrit : > Hi there! > > This isn’t a difficult problem to fix. For purposes of clarity, the monmap is > just a part of the monitor database. You generally have all the details > correct though. > > Have you looked at the process in > https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap? > > <https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap?> > > Please do make sure you are working on the copy of the monitor database with > the newest epoch. After removing the other monitors and getting your cluster > back online, you can re-add monitors at will. > > Also note that a quorum is defined as "one-half the total number of nodes > plus one”. In your case, quorum is defined by both nodes! Taking either down > would cause this problem. So you need to have an odd number of nodes to > provide the ability to take a node down, for instance in a rolling upgrade. > > Hope that helps! > > Brian > > >> On Oct 12, 2020, at 3:54 PM, Gaël THEROND <gael.ther...@bitswalk.com >> <mailto:gael.ther...@bitswalk.com>> wrote: >> > > >> Hi everyone, >> >> Because of unfortunate events, I’ve a containers based ceph cluster >> (nautilus) in a bad shape. >> >> One of the lab cluster which is only made of 2 nodes as control plane (I >> know it’s bad :-)) each of these nodes run a mon, a mgr and a rados-gw >> containerized ceph_daemon. >> >> They were installed using ceph-ansible if relevant for anyone. >> >> However, when I was performing an upgrade on one of the first nodes, the >> second went down too (electrical power outage). >> >> As soon as I saw that I stopped all current process within the upgrading >> node. >> >> For now, if I try to restart my second node, as the quorum is looking for >> two node the cluster isn’t available. >> >> The container start, the node elect itself as the master but all ceph >> commands are stuck forever, which is perfectly normal as the quorum still >> wait for one member to achieve the election process etc. >> >> So, my question is, as I can’t (to my knowledge) extract the monmap with >> this intermediary state, and as my first node will still be considered as a >> known mon and try to join back if started properly, can I just copy the >> /etc/ceph.conf and /var/lib/mon/<host>/keyring from the last living node >> (the second one) and copy everything at its own place within the first >> node? My mon keys were the same for both mon initially and if I’m not >> making any mistakes my first node being blank will try to create a default >> store, join the existing cluster and try to retrieve the appropriate monmap >> from the remaining node right? >> >> If not, is there a process to be able to save/extract the monmap when using >> a container based ceph ? I can perfectly exec on the remaining node if it >> make any difference. >> >> Thanks a lot! > >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> >> To unsubscribe send an email to ceph-users-le...@ceph.io >> <mailto:ceph-users-le...@ceph.io> > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io