As oddly as it drifted away it came back. Next time, should there be a next time, I will snag logs as suggested by Sascha.
The window for all this was, local time: 9:02 am - disassociated; 11:20 pm - associated. No changes were made, I did reboot the mon02 host at 1 pm. No other network or host issues were observed in the rest of the cluster or at the site. Thank you for your replies and I'll gather better loggin next time. peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpulse.com |virginpulse.com/global-challenge Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message. v2.64 From: Brad Hubbard <bhubb...@redhat.com> Date: Wednesday, January 8, 2020 at 6:21 PM To: Peter Eisch <peter.ei...@virginpulse.com> Cc: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> Subject: Re: [ceph-users] monitor ghosted On Thu, Jan 9, 2020 at 5:48 AM Peter Eisch <mailto:peter.ei...@virginpulse.com> wrote: Hi, This morning one of my three monitor hosts got booted from the Nautilus 14.2.4 cluster and it won’t regain. There haven’t been any changes, or events at this site at all. The conf file is the [unchanged] and the same as the other two monitors. The host is also running the MDS and MGR apps without any issue. The ceph-mon log shows this repeating: 2020-01-08 13:33:29.403 7fec1a736700 1 mon.cephmon02@1(probing) e7 handle_auth_request failed to assign global_id 2020-01-08 13:33:29.433 7fec1a736700 1 mon.cephmon02@1(probing) e7 handle_auth_request failed to assign global_id 2020-01-08 13:33:29.541 7fec1a736700 1 mon.cephmon02@1(probing) e7 handle_auth_request failed to assign global_id ... Try gathering a log with debug_mon 20. That should provide more detail about why AuthMonitor::_assign_global_id() didn't return an ID. There is nothing in the logs of the two remaining/healthy monitors. What is my best practice to get this host back in the cluster? peter _______________________________________________ ceph-users mailing list mailto:ceph-users@lists.ceph.com -- Cheers, Brad
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com