While dumping OVS flows as you suggested, we finally found the cause of the problem: our br-ex OVS bridge lacked the secure fail mode configuration.
May be the issue is related to this: https://bugs.launchpad.net/neutron/+bug/1607787 Thank you On Fri, May 26, 2017 at 6:03 AM, Kevin Benton <ke...@benton.pub> wrote: > Sorry about the long delay. > > Can you dump the OVS flows before and after the outage? This will let us > know if the flows Neutron setup are getting wiped out. > > On Tue, May 2, 2017 at 12:26 PM, Gustavo Randich < > gustavo.rand...@gmail.com> wrote: > >> Hi Kevin, here is some information aout this issue: >> >> - if the network outage lasts less than ~1 minute, then connectivity to >> host and instances is automatically restored without problem >> >> - otherwise: >> >> - upon outage, "ovs-vsctl show" reports "is_connected: true" in all >> bridges (br-ex / br-int / br-tun) >> >> - after about ~1 minute, "ovs-vsctl show" ceases to show "is_connected: >> true" on every bridge >> >> - upon restoring physical interface (fix outage) >> >> - "ovs-vsctl show" now reports "is_connected: true" in all >> bridges (br-ex / br-int / br-tun) >> >> - access to host and VMs is NOT restored, although some pings are >> sporadically answered by host (~1 out of 20) >> >> >> - to restore connectivity, we: >> >> >> - execute "ifdown br-ex; ifup br-ex" -> access to host is restored, >> but not to VMs >> >> >> - restart neutron-openvswitch-agent -> access to VMs is restored >> >> Thank you! >> >> >> >> >> On Fri, Apr 28, 2017 at 5:07 PM, Kevin Benton <ke...@benton.pub> wrote: >> >>> With the network down, does ovs-vsctl show that it is connected to the >>> controller? >>> >>> On Fri, Apr 28, 2017 at 2:21 PM, Gustavo Randich < >>> gustavo.rand...@gmail.com> wrote: >>> >>>> Exactly, we access via a tagged interface, which is part of br-ex >>>> >>>> # ip a show vlan171 >>>> 16: vlan171: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue >>>> state UNKNOWN group default qlen 1 >>>> link/ether 8e:14:8d:c1:1a:5f brd ff:ff:ff:ff:ff:ff >>>> inet 10.171.1.240/20 brd 10.171.15.255 scope global vlan171 >>>> valid_lft forever preferred_lft forever >>>> inet6 fe80::8c14:8dff:fec1:1a5f/64 scope link >>>> valid_lft forever preferred_lft forever >>>> >>>> # ovs-vsctl show >>>> ... >>>> Bridge br-ex >>>> Controller "tcp:127.0.0.1:6633" >>>> is_connected: true >>>> Port "vlan171" >>>> tag: 171 >>>> Interface "vlan171" >>>> type: internal >>>> ... >>>> >>>> >>>> On Fri, Apr 28, 2017 at 3:03 PM, Kevin Benton <ke...@benton.pub> wrote: >>>> >>>>> Ok, that's likely not the issue then. I assume the way you access each >>>>> host is via an IP assigned to an OVS bridge or an interface that somehow >>>>> depends on OVS? >>>>> >>>>> On Apr 28, 2017 12:04, "Gustavo Randich" <gustavo.rand...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Kevin, we are using the default listen address of loopback >>>>>> interface: >>>>>> >>>>>> # grep -r of_listen_address /etc/neutron >>>>>> /etc/neutron/plugins/ml2/openvswitch_agent.ini:#of_listen_address = >>>>>> 127.0.0.1 >>>>>> >>>>>> >>>>>> tcp/127.0.0.1:6640 -> ovsdb-server /etc/openvswitch/conf.db >>>>>> -vconsole:emer -vsyslog:err -vfile:info >>>>>> --remote=punix:/var/run/openvswitch/db.sock >>>>>> --private-key=db:Open_vSwitch,SSL,private_key >>>>>> --certificate=db:Open_vSwitch,SSL,certificate >>>>>> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir >>>>>> --log-file=/var/log/openvswitch/ovsdb-server.log >>>>>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 28, 2017 at 5:00 AM, Kevin Benton <ke...@benton.pub> >>>>>> wrote: >>>>>> >>>>>>> Are you using an of_listen_address value of an interface being >>>>>>> brought down? >>>>>>> >>>>>>> On Apr 25, 2017 17:34, "Gustavo Randich" <gustavo.rand...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> (using Mitaka / Ubuntu 16 / Neutron DVR / OVS / VXLAN / >>>>>>>> l2_population) >>>>>>>> >>>>>>>> This sounds very strange (to me): recently, after a switch outage, >>>>>>>> we lost connectivity to all our Mitaka hosts. We had to enter via iLO >>>>>>>> host >>>>>>>> by host and restart networking service to regain access. Then restart >>>>>>>> neutron-openvswitch-agent to regain access to VMs. >>>>>>>> >>>>>>>> At first glance we thought it was a problem with the NIC linux >>>>>>>> driver of the hosts not detecting link state correctly. >>>>>>>> >>>>>>>> Then we reproduced the issue simply bringing down physical >>>>>>>> interfaces for around 5 minutes, then up again. Same issue. >>>>>>>> >>>>>>>> And then.... we found that if instead of using native (ryu) >>>>>>>> OpenFlow interface in Neutron Openvswitch we used ovs-ofctl, the >>>>>>>> problem >>>>>>>> disappears. >>>>>>>> >>>>>>>> Any clue? >>>>>>>> >>>>>>>> Thanks in advance. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Mailing list: http://lists.openstack.org/cgi >>>>>>>> -bin/mailman/listinfo/openstack >>>>>>>> Post to : openst...@lists.openstack.org >>>>>>>> Unsubscribe : http://lists.openstack.org/cgi >>>>>>>> -bin/mailman/listinfo/openstack >>>>>>>> >>>>>>>> >>>>>> >>>> >>> >> >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators