Hi all, Our production ceph node each has two NIC, one used by heartbeat another used by cluster_network.
By accident, the heartbeat NIC is broken but the cluster_network NIC is healthy. But osds report the broken NIC node is unavailable, so monitor decide to kick out the node. I'm not sure what I describe match the code logic, if so, is it more reasonable that ceph-osd process can detect cluster_network is healthy so we don't kick out the broken node. -- Best Regards, Wheat _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com