Hi We have found a very strange bug in Open vSwitch, when it is connected to a Cisco Switch port, the port will randomly get err-disabled.
So we have 76 Debian servers installed with Open vSwitch (2.4.0), each connected an port in Cisco Switch 3110. There will be a chance of err-disabled port on Cisco Switch every week or two. From Cisco switch perspective, the port was disabled because detecting an loopback by receiving a keepalive message which was originated from the cisco switch port. Basically the keepalive message was like below: 11:37:01.749102 e8:04:62:c8:6e:81 > e8:04:62:c8:6e:81, ethertype Loopback (0x9000), length 60: Loopback, skipCount 0, Reply, receipt number 0, data (40 octets) 0x0000: 0000 0100 0000 0000 0000 0000 0000 0000 ................ 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. Our first guess was that Open vSwitch accidentally sends the keepalive message it received back to the port and leads to err-disabled state. Normally the Open vSwitch will discard this message, but once a week or two in 76 servers, it will get back to the port on the cisco switch and the port will be err-disabled. The work around we are using now are either disabling sending keepalive message on cisco switch or explicitly add a flow rule for discarding that keepalive message on Open vSwitch. The Open vSwitch version is: ovs-vswitchd (Open vSwitch) 2.4.0 Compiled Aug 31 2015 16:53:51 The configuration of the switch is: Bridge "acc_10064" Port "acc_10064" Interface "acc_10064" type: internal Port "vxnet2" Interface "vxnet2" Port "10064_88ad7aaa" Interface "10064_88ad7aaa-02" type: vxlan options: {key="10064", local_ip="IP1", remote_ip="IP2"} Interface "10064_88ad7aaa-01" type: vxlan options: {key="10064", local_ip="IP1", remote_ip="IP3"} Bridge "acc_10050" Port "10050_0977455a" Interface "10050_0977455a-01" type: vxlan options: {key="10050", local_ip="IP1", remote_ip="IP4"} Interface "10050_0977455a-02" type: vxlan options: {key="10050", local_ip="IP1", remote_ip="IP5"} Port "vxnet0" Interface "vxnet0" Port "acc_10050" Interface "acc_10050" type: internal Port "vxnet1" Interface "vxnet1" Bridge "br0" Port "eth0" Interface "eth0" Port "br0" Interface "br0" type: internal ovs_version: "2.4.0" The kernel version is: Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) The ovs-dpctl show output is: system@ovs-system: lookups: hit:536177037 missed:17196786 lost:0 flows: 182 masks: hit:1130706939 total:9 hit/pkt:2.04 port 0: ovs-system (internal) port 1: acc_10050 (internal) port 2: vxlan_sys_4789 (vxlan) port 3: eth0 port 4: br0 (internal) port 5: vxnet0 port 6: vxnet1 port 7: acc_10064 (internal) port 8: vxnet2 The Open vSwitch does not have a controller connected and it is configured as normal L2 switch. We have found some similar case on google but unanswered: https://forums.gentoo.org/viewtopic-p-7884924.html?sid=12abe544bda8782c840fa5c70df6e65e Thanks! Best Regards, Liang Dong
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss