So, we've found a document <https://github.com/openvswitch/ovs/blob/master/DESIGN.md#user-content-in-band-control> on in-band control for OVS. The hidden flows we see installed are exactly what the in-band control document states will be installed, including ARP flows to/from the LOCAL port's MAC address with output=NORMAL action.
One of these ARP flows is matching our ARP requests directed into br0 (LOCAL) and forwarding them as a learning switch (NORMAL). This looks like it's the issue. Now to figure out how this happened everywhere and how to disable it. Ryan Izard PhD Candidate, Research/Teaching Assistant ECE Department, Clemson University riz...@g.clemson.edu --------------------------------------------------- Big Switch Networks ryan.iz...@bigswitch.com On Wed, Apr 6, 2016 at 5:59 PM, Ryan Izard <riz...@g.clemson.edu> wrote: > On Wed, Apr 6, 2016 at 5:33 PM, Nicholas Bastin <nick.bas...@gmail.com> > wrote: > >> On Wed, Apr 6, 2016 at 5:16 PM, Ryan Izard <riz...@g.clemson.edu> wrote: >> >>> I have a very simple topology as follows: >>> >>> network----[Dell S4810]-24---link---1-[host w/OVS br0]-LOCAL >>> >>> The host with OVS has IP 192.168.1.3/24 with a route into the br0 (i.e. >>> LOCAL) interface. >>> >> >> I don't really understand what this means. What ports are on br0 and >> what interfaces have IP addresses? >> > > br0 has port 1 (eth1) and LOCAL (br0) > >> >> >>> We try to ping another host on the network from host 192.168.1.3, but >>> the ping confuses our controller's MAC learning algorithm due to OVS >>> mishandling ARP packets. Here are some observations: >>> >> >> Where are you issuing the ping from, the command line of the host with >> OVS? What do your local routing and arp tables look like? >> > > On the host itself running the OVS bridge, we have a route for > 192.168.1/24 into br0. We are running ping 192.168.1.4 from the host. > >> >> >>> -- using OVS 2.3.1 and has been running stably since release until >>> recently (no known changes) >>> >> >> Do ovs-vsctl commands hang? I doubt it in your case, but we've had some >> lockups on vswitchd that forced us to upgrade all the VTS hardware to 2.5.0. >> > > Nope. Nothing hangs. > >> >> >>> -- there is only 1 flow installed. It is a single, zero-priority, >>> fully-wildcarded table-miss flow w/output=controller >>> >> >> Well, not really.. :-) Try: >> >> sudo ovs-appctl bridge/dump-flows br0 >> > > Good idea :-) Did not realize you could dive that deep into into the > forwarding tables. There are some ARP flows with NORMAL output actions. > These definitely look suspicious, especially the one matching our host as > src MAC, ethertype=ARP, and opcode=request... > >> >> There's some special handling for ARP for in-band control that is set in >> very-high-priority hidden flows in a late pipeline table. Make sure you're >> not hitting those flows. >> > > All these hidden ARP flows are all very high (18000+) priority flows. Why > would these be here if we are operating in secure mode? More puzzling is > that we have probably 50 OVS bridges across all our disjoint network > topologies and disjoint control planes that this problem happened to > seemingly overnight. > >> >> >>> -- the Dell switch gets all the ARPs and sends them as packet-ins to our >>> controller, so they are being forwarded by the OVS somehow >>> >> >> I still don't quite understand your topology graph, but sourcing packets >> from a host connected to an OVS bridge that it is itself hosting can get >> problematic without some namespacing. >> > > Will look into this. Should the ideal setup be a veth pair -- one end > attached to the bridge and the other to a different netns? > >> > I hope this is a little better. Topology is: > > [LAN with other hosts, one is 192.168.1.4] > | > | > [Dell-S4810--port24]----[eth1(1)--br0(LOCAL)] > > IP 192.168.1.3/24 is assigned to br0. ARP packets sent to br0 by the host > running the OVS br0 bridge arrive on LOCAL. From there, we'd expect a > packet-in, which obviously now is being stopped by the hidden matching ARP > flow. Instead, OVS is forwarding ARP for us to port 1, which goes out eth1 > to our next hop switch. > >> >> > >> >>> -- tried installing explicit >>> priority=1,in_port=LOCAL,dl_type=0x806,actions=output:CONTROLLER flow; this >>> does not match the ARP packets. They are still forwarded thru OVS >>> -- there are no other routes on the host that could match the packets >>> and circumvent OVS >>> >>> My inclination is that OVS is forwarding all ARP packets "under the >>> table" and only sending L3+ and unknown ethertypes (LLDP perhaps?) to the >>> controller. >>> >> >> All I can guess right now is that you're hitting the in-band ARP matches, >> although I'm not sure why you've never had this problem before. More >> information about your topology and bridge configuration might reveal >> something more useful. >> > > Yes, we are hitting the in-band ARP matches, but again, as I mentioned > above, we've been running these OVS (of different versions) for a very long > time now using LOCAL as a way for our hosts running OVS to attach to the > data plane. Almost every OVS bridge we have running (on our own machines, > in CloudLab, in GENI) has gotten into this state at seemingly the same > time. They're all part of different networks and controllers and different > locations around the country. > >> >> -- >> Nick >> > >
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss