Hi there,

we currently use openvswitch in conjunction with our virtualization solution 
opennebula (http://www.opennebula.org)

Our setup is running on Ubuntu 14.04, on Dell R620 with 4x 10Gig Network 
adapters 
connected to 2 Force Ten 4820T switches, using VLT and a port channel and using 
lacp 802.3ad mode, so connecting each 2 interfaces to 2 different switches.

We had this running without openvswitch in our former setup with linux
bonding/ifenslave like so:

/etc/network/interfaces

# The primary network interface
auto eth0
iface eth0 inet manual
    bond-master bond1

auto eth1
iface eth1 inet manual
    bond-master bond0

auto eth3
iface eth3 inet manual
    bond-master bond1

auto eth4
iface eth4 inet manual
    bond-master bond0

auto bond0
iface bond0 inet static
    address  192.168.1.10
    netmask 255.255.255.0
    bond-lacp-rate 1
    bond-slaves none
    bond-mode 802.3ad
    bond-miimon 100

auto bond1
iface bond1 inet static
    address 192.168.2.10
    netmask 255.255.255.0
    gateway 192.168.2.1
    bond-lacp-rate 1
    bond-slaves none
    bond-mode 802.3ad
    bond-miimon 100

This worked as expected and we never had issues with it.

We now have a similar setup configured with openvswitch and 
/etc/network/interfaces 
now look like this:

allow-vmbr0 bond0
iface bond0 inet manual
  ovs_bridge vmbr0
  ovs_type OVSBond
  ovs_bonds eth1 eth4
  ovs_options bond_mode=balance-tcp lacp=active other-config:lacp-time=fast 
other_config:lacp-fallback-ab=true 

allow-vmbr1 bond1
iface bond1 inet manual
  ovs_bridge vmbr1
  ovs_type OVSBond
  ovs_bonds eth0 eth3
  ovs_options bond_mode=balance-tcp lacp=active other-config:lacp-time=fast 
other_config:lacp-fallback-ab=true

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports bond0 vlan-pub

auto vmbr1
allow-ovs vmbr1
iface vmbr1 inet manual
  ovs_type OVSBridge
  ovs_ports bond1 vlan-prv

allow-vmbr0 vlan-pub
iface vlan-pub inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  address 192.168.1.10
  netmask 255.255.255.0
  
allow-vmbr1 vlan-prv
iface vlan-prv inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr1
  address 192.168.2.10
  netmask 255.255.255.0
  gateway 192.168.2.1


This also works as expected until it suddenly stops working. The effect is 
that, randomly
all these nodes lose their network connection completely, somtimes only on one 
bond interface,
sometimes even on both. Alle the VMs running on it also lose their connections. 

It also seems not really related to the amount of traffic as far as I can see, 
there are
low traffic machines which stop earlier than high traffic ones.

After a restart of openvswitch, everything starts working again until it stops 
the next time.
This is very annyoing since in the worst case, we have to connect with DRAC via 
console to fix
the issue and get everything working again. 

The ovs-vswitchd logfiles also doesn't give much information, I can provide one 
if needed.

What did we already try to fix this:

- Upgraded openvswitch from 2.3.1-1 to 2.4.0-1 -> no effect
- Use balance-slb instead of balance-tcp -> no effect
- Use ovs_options bond_mode=active-backup -> this is our current workaround, 
all our machines are currently set to active failover and in this mode we have 
no problems. 
- Upgraded switch firmware to latest version -> no effect
- Today I even tried an openvswitch snapshot from git 
(cc245ce87d3de9c2a66ee42719ab413e464fb2de) -> The upgrade first broke the 
network connection and I had issues with starting/restarting so I did not try 
this any longer

output of ovs-vswitchd --version

ovs-vswitchd (Open vSwitch) 2.4.0
Compiled Oct  5 2015 11:12:38

Output of cat /proc/version

Linux version 3.13.0-63-generic (buildd@lgw01-18) (gcc version 4.8.2 (Ubuntu 
4.8.2-19ubuntu1) ) #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015

output from ovs-dpctl show
system@ovs-system:
        lookups: hit:30279107869 missed:103538298 lost:94
        flows: 430
        masks: hit:106941297278 total:4 hit/pkt:3.52
        port 0: ovs-system (internal)
        port 1: vlan-prv (internal)
        port 2: eth4
        port 3: eth2
        port 4: bond1 (internal)
        port 5: vmbr1 (internal)
        port 6: vlan-pub (internal)
        port 7: vmbr0 (internal)
        port 8: eth5
        port 9: eth3
        port 10: bond0 (internal)
        port 11: vnet0
        port 12: vnet1
        port 13: vnet2
        port 14: vnet3
        port 15: vnet4
        port 16: vnet5
        port 17: vnet6
        port 18: vnet7
        port 19: vnet8
        port 20: vnet9
        port 21: vnet10
        port 22: vnet11
        port 23: vnet12
        port 24: vnet13
        port 25: vnet14
        port 26: vnet15
        port 27: vnet16
        port 28: vnet17
        port 29: vnet18
        port 30: vnet19
        port 31: vnet20
        port 32: vnet21
        port 33: vnet22
        port 34: vnet23
        port 35: vnet24
        port 36: vnet25
        port 37: vnet26
        port 38: vnet27
        port 39: vnet28
        port 40: vnet29

I did not attach the contents of /etc/openvswitch/conf.db yet as this is 
currently 
~200k and it's also running in active-backup mode right now anyway. I can 
provide 
this if needed.

Thanks a lot for having a look into this

Gernot Poerner
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to