dear ceph users and developers,

on one of our production clusters, we got into pretty unpleasant situation.

After rebooting one of the nodes, when trying to start monitor, whole cluster
seems to hang, including IO, ceph -s etc. When this mon is stopped again,
everything seems to continue. Traying to spawn new monitor leads to the same 
problem
(even on different node).

I had to give up after minutes of outage, since it's unacceptable. I think we 
had this
problem once in the past on this cluster, but after some (but much shorter) 
time, monitor
joined and it worked fine since then.

All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are now 
running), I'm
using ceph 13.2.6

Network connection seems to be fine.

Anyone seen similar problem? I'd be very grateful for tips on how to debug and 
solve this..

for those interested, here's log of one of running monitors with debug_mon set 
to 10/10:

https://storage.lbox.cz/public/d258d0

if I could provide more info, please let me know

with best regards

nikola ciprich







-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to