Hi all,

Here our story. Perhaps some day could help anyone. Be in mind that English is 
not my native language so sorry if I make mistakes.

Our system is: Ceph 0.87.2 (Giant), with 5 OSD servers (116 1TB osd total) and 
3 monitors.

After a nightmare time, we initially "correct" ceph monitor problems. But 
first, some additional info and a TimeLine (Dates are in dd-mm-yyyy format).

At the beginning, we had 3 working monitors and we were happy. (MON01, MON02 
and MON03)

Wednesday 05/06/2019:
After a SAI outage on B line, we found in MON03 ceph-mon process does not clean 
start: after initiating ceph-mon, ceph-create-keys does not contact with 
daemon. We work with quorum with 2 monitors, and has access to Ceph Storage.

Thusrday 06/06/2019
We have the "good" idea to add a new mon into mon cluster... this was our first 
error. After "ceph-deploy mon mon.mon04" command, new monitor activates (4 
monitor in cluster) but... only 2 monitors had data (mon01 and mon02) and this 
is equal no quorum. As no quorum, mon04 does not contact mon cluster. We lost 
"ceph" commands as no monitor can held quorum, so any ceph related command 

Fortunately, storage "works" and active openstack instance were not affected 
(we do not know why it works, but it does). At this point, we made some mon02 
and mon04 restart. I do not remember order, but our priority was recover mon 
quorum :(  After mon02 restart, repeats same behaviour than mon03: 
ceph-create-keys does not contact deamon.

We left cluster "working" with mon01 in electing status and mon04 in waiting to 
add to cluster.

Friday 07/06/2019
We prepare a new monitor computer (mon05) to integrate on Mon's cluster. Our 
idea was "If we develop mon05 and integrate to mon cluster, this could work as 
3 mon's up will make quorum..."

We done a "ceph-mon -i mon05 --mkfs --monmap /root/monmap-mon04-original  
--keyring /root/keyring" with data extracted from mon04 (keyring and monmap) 
and started it with ceph-mon -i mon05 -c /etc/ceph/ceph.conf --cluster ceph"...

Yes, it works. We were very happy because we recover monitors quorum, we have 
ceph related commands and all works.... but only 10 minutes :(

And here nightmare began.

Slow request began to increase. We do not know why, so initialy we restart 
affected osd. After 3 hours  restarting osd's we think " this is not normal. 
What's happening here?"
Osd logs show some "key errors" contacting others osd's and monitors. Really we 
were in trouble, because openstack cinder can't contact rbd volumes, rbd 
commands shows a lot of key errors when readind pool volumes. Really all system 
goes down, so no write or read was made to storage.... We tried to restart 
Mon's, restart openstack serices, restart osd's (one at time), check NTP (no 
errors here) check iptables check anything that colul be checkered...  with no 
We remake monitors 2 and 3 formating ceph-mon data in the same way we do with 
mon05, so we have a 5 monitors cluster, but key errors does no disappears.

And  when no more things we can do...  we use a Spanish sentence: "De perdidos, 
al rio" (direct translation: From the lost, to the river i. e. when nothing 
works and all is lost, you can try anything you want) So...we think "the only 
monitor we never touch is mon01 (the active monitor) so if we reset it?"

Thought and done. We stop mon01. Monitor quorum was transferred to  Mon02, but 
slow request were there. We restart ceph-mon on mon01... but again, 
ceph-create-keys does not contact daemon. We lost Mon01. So mon02 to mon05 was 
working in quorum.

And, suddenly, storage began to recover: slow request decrease, rbd commands 
works, osd logs show normal info (any key related error) and 10 minutes after 
mon01 down, all cluster was active and clean.

After this story, we have some "things to be in mind" we want to share:

- Always have more than 1 "initial-monitors" defines in ceph. We have only one, 
and if it is not active, the other monitors does not start (after storage 
recovery, we stop mon05 and it has status "probing" trying to contact mon01, 
which is down)
- Have a copy of monitors keyring and monmap. This is the safe way to add 
manually monitors to cluster when no ceph related commands works
- Be careful adding or removing monitors in a not healthy monitor cluster: If 
they lost quorum you will be into problems.

Now, we have some work to do:
- Remove mon01 with "ceph mon destroy mon01": we want to remove it from monmap, 
but is the "initial monitor" so we do not know if it is safe to do.
- Clean and "format" monitor data (as we do on mon02 and mon03) for mon01, but 
we have the same situation: is safe to do when is the "initial mon"?
- Modify monmap, deleting mon01, and inyect it om mon05, but...  what happens 
when we delete "initial mon" from monmap? Is safe?

As you can understand, we have now a working storage but in a critical 
situation, because any problem with monitors could  bring it again unstable... 
And there is still 15 TB of data inside.

If someone has any "safe" idea to share....  will be appreciated.



Lluís Arasanz Nonell * Departamento de Sistemas
Tel: +34 902 902 685
email: lluis.aras...@adam.es<mailto:lluis.aras...@adam.es>

[16-linkedin]<https://www.linkedin.com/company/adam-tic>  [16-twitter] 

Advertencia legal: La información contenida en este mensaje y/o archivo(s) 
adjunto(s), enviada desde OGIC INFORMATICA SLU, es confidencial/privilegiada y 
está destinada a ser leída sólo por la(s) persona(s) a la(s) que va dirigida. 
Le recordamos que sus datos han sido incorporados en el sistema de tratamiento 
de OGIC INFORMATICA SLU y que siempre y cuando se cumplan los requisitos 
exigidos por la normativa, usted podrá ejercer sus derechos de acceso, 
rectificación, limitación de tratamiento, supresión, portabilidad y 
oposición/revocación, en los términos que establece la normativa vigente en 
materia de protección de datos, dirigiendo su petición a la dirección postal 
TRAVESSERA DE GRACIA 342-344 08025, BARCELONA o bien a través de correo 
electrónico administrac...@adam.es Si usted lee este mensaje y no es el 
destinatario señalado, el empleado o el agente responsable de entregar el 
mensaje al destinatario, o ha recibido esta comunicación por error, le 
informamos que está totalmente prohibida, y puede ser ilegal, cualquier 
divulgación, distribución o reproducción de esta comunicación, y le rogamos que 
nos lo notifique inmediatamente y nos devuelva el mensaje original a la 
dirección arriba mencionada. Gracias.
[NoImprimir]No imprimas si no es necesario. Protejamos el Medio Ambiente.

ceph-users mailing list

Reply via email to