On Sat, Jul 19, 2014 at 11:08 AM, Wang Haomai wrote:
> Oh, it's our fault.
>
> Public_addr and cluster_addr use the same NIC(eth1). But we found during
> recovering heartbeat may timeout because of busy traffic. I *misunderstood*
> the mean of heartbeat and use another NIC(eth0) address for hear
Oh, it's our fault.
Public_addr and cluster_addr use the same NIC(eth1). But we found during
recovering heartbeat may timeout because of busy traffic. I *misunderstood* the
mean of heartbeat and use another NIC(eth0) address for heartbeat to avoid
timeout.
From your points, it's easy to unders
The heartbeat code is very careful to use the same physical interfaces as
1) the cluster network
2) the public network
If the first breaks, the OSD can't talk with its peers. If the second
breaks, it can't talk with the monitors or clients. Either way, the
OSD can't do its job so it gets marked do
Hi all,
Our production ceph node each has two NIC, one used by heartbeat
another used by cluster_network.
By accident, the heartbeat NIC is broken but the cluster_network NIC
is healthy. But osds report the broken NIC node is unavailable, so
monitor decide to kick out the node.
I'm not sure what