Forgot to CC list...

> Try doing ifconfig eth1 down instead.  This will take carrier down on
> the NIC causing the upstream switch to flush it's learning table.
> This is more realistic too, bond's don't typically failover when there
> isn't a problem, so connectivity loss is expected when using
> set-active-slave as you're doing.

In a production environment, there might be cases where doing a manual
failover has it purposes:
- maintenance, e.g. when you want to bring down a switch for
maintenance and you want to do a controlled failover
- because you want to spread the load to another switch (when active
and failback interfaces are on different switches)
- because you want to have 2 servers on the same switch, ...

I followed your suggestion to bring down eth1 (which was the active
one).  Can this flushing behavior be switch-dependent?
The device was disabled and a failover took place.  ethtool reports
the link to be down (Link detected: no).
No interruption in network communication to the host...

bond_mode: active-backup
bond-hash-basis: 0
updelay: 200 ms
downdelay: 200 ms
lacp_negotiated: false

slave eth1: disabled
       may_enable: false

slave eth0: enabled
       active slave
       may_enable: true


But the 4 running kvm guests however were unavailable for some time (I
had fping running to the guests during the test from a host several
switches away):

During the switch 100% loss:
Fri Jun 22 09:17:10 CEST 2012
sles111-flcapp-chico : xmt/rcv/%loss = 1/0/100%
sles111-flwapp-aka   : xmt/rcv/%loss = 1/0/100%
sles111-repapp-ribet : xmt/rcv/%loss = 1/0/100%
centos6-jmsdb-jajan  : xmt/rcv/%loss = 1/0/100%

1 second later: the first guest responds:
Fri Jun 22 09:17:11 CEST 2012
sles111-flcapp-chico : xmt/rcv/%loss = 1/0/100%
sles111-flwapp-aka   : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.60/0.60/0.60
sles111-repapp-ribet : xmt/rcv/%loss = 1/0/100%
centos6-jmsdb-jajan  : xmt/rcv/%loss = 1/0/100%

16 seconds later, 2 other guests start responding:
Fri Jun 22 09:17:26 CEST 2012
sles111-flcapp-chico : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.53/0.53/0.53
sles111-flwapp-aka   : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.49/0.49/0.49
sles111-repapp-ribet : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.54/0.54/0.54
centos6-jmsdb-jajan  : xmt/rcv/%loss = 1/0/100%

29 seconds later, the 4th guest starts responding:
Fri Jun 22 09:17:39 CEST 2012
sles111-flcapp-chico : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.29/0.29/0.29
sles111-flwapp-aka   : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.45/0.45/0.45
sles111-repapp-ribet : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.46/0.46/0.46
centos6-jmsdb-jajan  : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.80/0.80/0.80


> Something like this could be added to OVS as we currently do it for
> SLB bonds.  But the situation your testing is unrealistic (due to the
> carrier not dropping) so I'd prefer to avoid it until there's a
> real-world use case.

Does it have other drawbacks besides extra arp communication?
Perhaps the test above is more realistic. Yesterday I also did the
test by shutting down the port on the switch, and if I recall
correctly, the same behavior was seen.
Problem is I don't have admin rights on the switch, so I can't test it quickly.

Thanks for your response,
Frido Roose
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to