Hi

We have found a very strange bug in Open vSwitch, when it is connected to a
Cisco Switch port, the port will randomly get err-disabled.

So we have 76 Debian servers installed with Open vSwitch (2.4.0), each
connected an port in Cisco Switch 3110. There will be a chance of
err-disabled port on Cisco Switch every week or two. From Cisco switch
perspective, the port was disabled because detecting an loopback by
receiving a keepalive message which was originated from the cisco switch
port.

Basically the keepalive message was like below:

11:37:01.749102 e8:04:62:c8:6e:81 > e8:04:62:c8:6e:81, ethertype Loopback
(0x9000), length 60: Loopback, skipCount 0, Reply, receipt number 0, data
(40 octets)
0x0000:  0000 0100 0000 0000 0000 0000 0000 0000  ................
0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............

Our first guess was that Open vSwitch accidentally sends the keepalive
message it received back to the port and leads to err-disabled state.
Normally the Open vSwitch will discard this message, but once a week or two
in 76 servers, it will get back to the port on the cisco switch and the
port will be err-disabled.

The work around we are using now are either disabling sending keepalive
message on cisco switch or explicitly add a flow rule for discarding that
keepalive message on Open vSwitch.

The Open vSwitch version is:
ovs-vswitchd (Open vSwitch) 2.4.0
Compiled Aug 31 2015 16:53:51

The configuration of the switch is:
    Bridge "acc_10064"
        Port "acc_10064"
            Interface "acc_10064"
                type: internal
        Port "vxnet2"
            Interface "vxnet2"
        Port "10064_88ad7aaa"
            Interface "10064_88ad7aaa-02"
                type: vxlan
                options: {key="10064", local_ip="IP1", remote_ip="IP2"}
            Interface "10064_88ad7aaa-01"
                type: vxlan
                options: {key="10064", local_ip="IP1", remote_ip="IP3"}
    Bridge "acc_10050"
        Port "10050_0977455a"
            Interface "10050_0977455a-01"
                type: vxlan
                options: {key="10050", local_ip="IP1", remote_ip="IP4"}
            Interface "10050_0977455a-02"
                type: vxlan
                options: {key="10050", local_ip="IP1", remote_ip="IP5"}
        Port "vxnet0"
            Interface "vxnet0"
        Port "acc_10050"
            Interface "acc_10050"
                type: internal
        Port "vxnet1"
            Interface "vxnet1"
    Bridge "br0"
        Port "eth0"
            Interface "eth0"
        Port "br0"
            Interface "br0"
                type: internal
    ovs_version: "2.4.0"

The kernel version is:
Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version
4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04)

The ovs-dpctl show output is:
system@ovs-system:
lookups: hit:536177037 missed:17196786 lost:0
flows: 182
masks: hit:1130706939 total:9 hit/pkt:2.04
port 0: ovs-system (internal)
port 1: acc_10050 (internal)
port 2: vxlan_sys_4789 (vxlan)
port 3: eth0
port 4: br0 (internal)
port 5: vxnet0
port 6: vxnet1
port 7: acc_10064 (internal)
port 8: vxnet2

The Open vSwitch does not have a controller connected and it is configured
as normal L2 switch.

We have found some similar case on google but unanswered:
https://forums.gentoo.org/viewtopic-p-7884924.html?sid=12abe544bda8782c840fa5c70df6e65e

Thanks!

Best Regards,

Liang Dong
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to