On Fri, Jun 03, 2011 at 02:37:53AM +0200, Per von Zweigbergk wrote: > 3 jun 2011 kl. 00.10 skrev Patrick Lamaiziere: > > > You may want to implement your own control because if the two hosts > > cannot communicate, you will have two masters. This can happen if the > > links on the both hosts are up, but none packet are forwarded (ie the > > switch connecting the two boxes is broken in some way). > > As a general thought that might be interesting when you're building your HA > solution: > > One less-documented feature of VMware ESXi is that it checks whether it's > isolated from the network by pinging the gateway on the management network. > > This is how ESXi trys to avoid having a split-brain condition - by making > sure that it only considers itself to be the master if it can reach the > gateway, but cannot reach any other servers. You might implement gating in a > similar way to avoid a split-brain condition in your HA solution.
If that's indeed true, VMware ESXi is doing something Extremely Bad. Pinging the local gateway (read: A ROUTER) as a form of determining if network I/O is failing is an unwise decision. Commercial-grade routers (read: Cisco, Juniper) all implement a form of ICMP prioritisation. The router can (and will) discard/drop inbound ICMP packets directed at the router itself (e.g. a destination IP of the gateway) during high CPU utilisation. Packets destined to a router itself (e.g. destination IP is the router) are handled very, very differently. This is why network engineers always recommend that when testing for network anomalies, the client (source IP) should attempt to speak to a web server, another box, whatever -- anything as long as it's not a router -- for its destination IP. At my workplace, for quite some time our Solaris machines using mpathd were configured to ping their default gateway (a Juniper M320). After we expanded and scaled out, we found that mpath would randomly fail over to the 2nd NIC for presumably no reason. The above description was the root cause. The solution was to have mpath probe against a dedicated host (another Solaris box) rather than the network gateway. Problem solved. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"