Hello,

So at some point during the night, our monitor 1 server rebooted for so far
unknown reason.  When it came back up, the clock was skewed by 6 hours.
There were no right happening when I got alerted to the issue.  ceph shows
all OSD's up and in, but no op/s and 600+ blocked requests.  I logged into
mon1, fixed the clock and restarted it.  Ceph status, showed all mons up,
no skew, but still no op/s.

Check the OSD logs, see cephx auth errors, which can be caused by clock
skew, from ceph website.  So try to restart the one osd to check and same
thing.  So I stopped mon1, figuring it would roll over to use mon2/3 and
get us backup and running.

Well, the OSD weren't showing as up, so I check my ceph.conf file to see
why it wasn't failing over to mon2/3 and notice it only has the ip for
mon1, so update ceph.conf with the ip for mon2/3 and restart, OSD come back
up and start talking again.

So right now, mon1 is offline, and I only have mon2/3 running.  Without
knowing why mon1 was having issues, I don't want to start it and bring it
back in, just to have the cluster freak.  At the same time, I'd like to get
back to having a quorum. I'm still review the logs on mon1 to try and see
if there are any errors that might point me to the issue.

In the mean time, my questions are.  Do you think it would be worth trying
starting mon1 again and see what happens?  If it still has issues, will my
OSD's failover to mon2/3 now that the conf is correct?  Is there any other
issues that might arise from bring it back in?

The other option I could think of would be deploy a new monitor 4 and then
remove the monitor 1, but I think this could lead to other issues if I am
reading the docs correct on correctly.

All our PG's are active+clean, so the cluster is in a healthy state.  The
only warn is from having set no scrub and no deep scrub and 1 mon being
down.

Any advice would be greatly appreciated.  Sorry for the long windedness of
it and scattered thought process.

Thanks,
Curt
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to