Hi, looks like I'm currently in SLB bonding bug hunting mood :) - I think I found an additional bug / strange behaviour with balance-slb bonding: Two VMs suffered from short term (a few minutes) connectivity issues every now and then, so I started digging further into the issue. First thing I noticed is the MAC of the VM jumping between ports on the uplink switches like mad - I had to think of my other bug report [1] - but this time I saw no broad- or multicast frames but normal unicast frames sent out on both member ports of the bond for a few minutes every now and then.
Long story short: XCP 1.6 configures SLB bonds to rebalance their traffic every 30 minutes. And it looks like Open vSwitch sometimes fails in migrating certain flows from one interface to the other. That causes some traffic to be sent via the "old" interface and some via the "new" interface. Do debug the issue further I set up tcpdump on both bond member interfaces, monitored the log file for notifications of traffic shifting and did a ovs- dpctl dump-flows xapi1 as I noticed the problem. First, the log showed the following: --- cut --- Jan 29 18:17:36 hostname ovs-vswitchd: 30635|bond|INFO|bond bond0: shift 13622kB of load (with hash 101) from eth1 to eth0 (now carrying 25209kB and 45293kB load, respectively) Jan 29 18:17:36 hostname ovs-vswitchd: 30636|bond|INFO|bond bond0: shift 2036kB of load (with hash 112) from eth0 to eth1 (now carrying 43257kB and 27245kB load, respectively) Jan 29 18:17:36 hostname ovs-vswitchd: 30637|bond|INFO|bond bond0: shift 29434kB of load (with hash 189) from eth0 to eth1 (now carrying 13823kB and 56680kB load, respectively) Jan 29 18:17:36 hostname ovs-vswitchd: 30638|bond|INFO|bond bond0: shift 1413kB of load (with hash 3) from eth1 to eth0 (now carrying 55267kB and 15236kB load, respectively) Jan 29 18:17:36 hostname ovs-vswitchd: 30639|bond|INFO|bond bond0: shift 1030kB of load (with hash 62) from eth1 to eth0 (now carrying 54236kB and 16266kB load, respectively) Jan 29 18:17:36 hostname ovs-vswitchd: 30640|bond|INFO|bond bond0: shift 1831kB of load (with hash 111) from eth1 to eth0 (now carrying 52405kB and 18098kB load, respectively) Jan 29 18:17:36 hostname ovs-vswitchd: 30641|bond|INFO|bond bond0: shift 20447kB of load (with hash 203) from eth1 to eth0 (now carrying 31957kB and 38546kB load, respectively) --- cut --- The traffic of the VM I was monitoring in this case should fall into hash 189 (calculated with ovs-appctl bond/hash). tcpdump showed the following: - First packet on eth1 at 18:17:37 - Still traffic on eth0 - Last packet sent via eth0 at 18:19:16 So in this case there's a timeframe of roughly 90 seconds in which both bonded interfaces are used to send out traffic, which is causing great trouble as we all know. Now to ovs-dpctl dump-flows xapi1: I used grep to filter for the MAC of the VM in question - please find my results attached to this e-mail. As that's taken from production servers I'm forced to replace IP addresses but I left the MAC and VLAN information as is. Background information: Two VMs form a HA cluster: 192.168.0.21 and 192.168.0.22 (they use UDP/691 for cluster communication and TCP/7781, TCP/7782 and TCP/7783 for DRBD communication). Cluster communication is happening a few times every second. actions:push_vlan(vid=200,pcp=0),1 and actions:push_vlan(vid=200,pcp=0),2 at the same time should show the issue. Looks like Open vSwitch was unable to move the cluster communication on UDP/691 over to the other interface. I hope that's enough information for you. Regards, Markus [1] Message-ID: <kdmmcb$nhc$1...@ger.gmane.org>; Subject: [BUG] broad-/multicast & SLB bonding -> FAIL
in_port(157),eth(src=56:cf:67:4f:46:89,dst=00:00:00:ff:00:02),eth_type(0x0806),arp(sip=192.168.0.22,tip=192.168.0.242,op=2,sha=56:cf:67:4f:46:89,tha=00:00:00:ff:00:02), packets:0, bytes:0, used:never, actions:push_vlan(vid=200,pcp=0),1 in_port(157),eth(src=56:cf:67:4f:46:89,dst=1a:16:08:6c:f7:6c),eth_type(0x0800),ipv4(src=192.168.0.22,dst=192.168.0.21,proto=17,tos=0x10,ttl=64,frag=no),udp(src=42347,dst=691), packets:2259, bytes:518945, used:0.040s, actions:push_vlan(vid=200,pcp=0),2 in_port(157),eth(src=56:cf:67:4f:46:89,dst=1a:16:08:6c:f7:6c),eth_type(0x0800),ipv4(src=192.168.0.22,dst=192.168.0.21,proto=6,tos=0,ttl=64,frag=no),tcp(src=56923,dst=7781), packets:0, bytes:0, used:never, actions:push_vlan(vid=200,pcp=0),1 in_port(157),eth(src=56:cf:67:4f:46:89,dst=1a:16:08:6c:f7:6c),eth_type(0x0800),ipv4(src=192.168.0.22,dst=192.168.0.21,proto=6,tos=0,ttl=64,frag=no),tcp(src=57549,dst=7782), packets:2, bytes:152, used:2.540s, actions:push_vlan(vid=200,pcp=0),1 in_port(157),eth(src=56:cf:67:4f:46:89,dst=1a:16:08:6c:f7:6c),eth_type(0x0800),ipv4(src=192.168.0.22,dst=192.168.0.21,proto=6,tos=0,ttl=64,frag=no),tcp(src=7783,dst=34851), packets:4, bytes:280, used:1.471s, actions:push_vlan(vid=200,pcp=0),1 in_port(1),eth(src=1a:16:08:6c:f7:6c,dst=56:cf:67:4f:46:89),eth_type(0x8100),vlan(vid=200,pcp=0),encap(eth_type(0x0800),ipv4(src=192.168.0.21,dst=192.168.0.22,proto=17,tos=0x10,ttl=64,frag=no),udp(src=34746,dst=691)), packets:1, bytes:226, used:1.620s, actions:pop_vlan,157 in_port(1),eth(src=1a:16:08:6c:f7:6c,dst=56:cf:67:4f:46:89),eth_type(0x8100),vlan(vid=200,pcp=0),encap(eth_type(0x0800),ipv4(src=192.168.0.21,dst=192.168.0.22,proto=6,tos=0,ttl=64,frag=no),tcp(src=34851,dst=7783)), packets:3, bytes:218, used:1.501s, actions:pop_vlan,157 in_port(1),eth(src=1a:16:08:6c:f7:6c,dst=56:cf:67:4f:46:89),eth_type(0x8100),vlan(vid=200,pcp=0),encap(eth_type(0x0800),ipv4(src=192.168.0.21,dst=192.168.0.22,proto=6,tos=0,ttl=64,frag=no),tcp(src=7781,dst=56923)), packets:1, bytes:66, used:1.240s, actions:pop_vlan,157 in_port(1),eth(src=1a:16:08:6c:f7:6c,dst=56:cf:67:4f:46:89),eth_type(0x8100),vlan(vid=200,pcp=0),encap(eth_type(0x0800),ipv4(src=192.168.0.21,dst=192.168.0.22,proto=6,tos=0,ttl=64,frag=no),tcp(src=7782,dst=57549)), packets:0, bytes:0, used:never, actions:pop_vlan,157 in_port(2),eth(src=1a:16:08:6c:f7:6c,dst=56:cf:67:4f:46:89),eth_type(0x8100),vlan(vid=200,pcp=0),encap(eth_type(0x0800),ipv4(src=192.168.0.21,dst=192.168.0.22,proto=17,tos=0x10,ttl=64,frag=no),udp(src=34746,dst=691)), packets:479, bytes:109082, used:0.620s, actions:pop_vlan,157 in_port(2),eth(src=1a:16:08:6c:f7:6c,dst=56:cf:67:4f:46:89),eth_type(0x8100),vlan(vid=200,pcp=0),encap(eth_type(0x0800),ipv4(src=192.168.0.21,dst=192.168.0.22,proto=6,tos=0,ttl=64,frag=no),tcp(src=7782,dst=57549)), packets:1, bytes:66, used:2.503s, actions:pop_vlan,157
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev