Hi Kevin, I confirm that applying the patch the problem is fixed. Sorry for the inconvenience.
On Tue, May 30, 2017 at 9:36 PM, Kevin Benton <ke...@benton.pub> wrote: > Do you have that patch already in your environment? If not, can you > confirm it fixes the issue? > > On Tue, May 30, 2017 at 9:49 AM, Gustavo Randich < > gustavo.rand...@gmail.com> wrote: > >> While dumping OVS flows as you suggested, we finally found the cause of >> the problem: our br-ex OVS bridge lacked the secure fail mode configuration. >> >> May be the issue is related to this: https://bugs.launchpad.net/neu >> tron/+bug/1607787 >> >> Thank you >> >> >> On Fri, May 26, 2017 at 6:03 AM, Kevin Benton <ke...@benton.pub> wrote: >> >>> Sorry about the long delay. >>> >>> Can you dump the OVS flows before and after the outage? This will let us >>> know if the flows Neutron setup are getting wiped out. >>> >>> On Tue, May 2, 2017 at 12:26 PM, Gustavo Randich < >>> gustavo.rand...@gmail.com> wrote: >>> >>>> Hi Kevin, here is some information aout this issue: >>>> >>>> - if the network outage lasts less than ~1 minute, then connectivity to >>>> host and instances is automatically restored without problem >>>> >>>> - otherwise: >>>> >>>> - upon outage, "ovs-vsctl show" reports "is_connected: true" in all >>>> bridges (br-ex / br-int / br-tun) >>>> >>>> - after about ~1 minute, "ovs-vsctl show" ceases to show "is_connected: >>>> true" on every bridge >>>> >>>> - upon restoring physical interface (fix outage) >>>> >>>> - "ovs-vsctl show" now reports "is_connected: true" in all >>>> bridges (br-ex / br-int / br-tun) >>>> >>>> - access to host and VMs is NOT restored, although some pings >>>> are sporadically answered by host (~1 out of 20) >>>> >>>> >>>> - to restore connectivity, we: >>>> >>>> >>>> - execute "ifdown br-ex; ifup br-ex" -> access to host is >>>> restored, but not to VMs >>>> >>>> >>>> - restart neutron-openvswitch-agent -> access to VMs is restored >>>> >>>> Thank you! >>>> >>>> >>>> >>>> >>>> On Fri, Apr 28, 2017 at 5:07 PM, Kevin Benton <ke...@benton.pub> wrote: >>>> >>>>> With the network down, does ovs-vsctl show that it is connected to the >>>>> controller? >>>>> >>>>> On Fri, Apr 28, 2017 at 2:21 PM, Gustavo Randich < >>>>> gustavo.rand...@gmail.com> wrote: >>>>> >>>>>> Exactly, we access via a tagged interface, which is part of br-ex >>>>>> >>>>>> # ip a show vlan171 >>>>>> 16: vlan171: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc >>>>>> noqueue state UNKNOWN group default qlen 1 >>>>>> link/ether 8e:14:8d:c1:1a:5f brd ff:ff:ff:ff:ff:ff >>>>>> inet 10.171.1.240/20 brd 10.171.15.255 scope global vlan171 >>>>>> valid_lft forever preferred_lft forever >>>>>> inet6 fe80::8c14:8dff:fec1:1a5f/64 scope link >>>>>> valid_lft forever preferred_lft forever >>>>>> >>>>>> # ovs-vsctl show >>>>>> ... >>>>>> Bridge br-ex >>>>>> Controller "tcp:127.0.0.1:6633" >>>>>> is_connected: true >>>>>> Port "vlan171" >>>>>> tag: 171 >>>>>> Interface "vlan171" >>>>>> type: internal >>>>>> ... >>>>>> >>>>>> >>>>>> On Fri, Apr 28, 2017 at 3:03 PM, Kevin Benton <ke...@benton.pub> >>>>>> wrote: >>>>>> >>>>>>> Ok, that's likely not the issue then. I assume the way you access >>>>>>> each host is via an IP assigned to an OVS bridge or an interface that >>>>>>> somehow depends on OVS? >>>>>>> >>>>>>> On Apr 28, 2017 12:04, "Gustavo Randich" <gustavo.rand...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Kevin, we are using the default listen address of loopback >>>>>>>> interface: >>>>>>>> >>>>>>>> # grep -r of_listen_address /etc/neutron >>>>>>>> /etc/neutron/plugins/ml2/openvswitch_agent.ini:#of_listen_address >>>>>>>> = 127.0.0.1 >>>>>>>> >>>>>>>> >>>>>>>> tcp/127.0.0.1:6640 -> ovsdb-server >>>>>>>> /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info >>>>>>>> --remote=punix:/var/run/openvswitch/db.sock >>>>>>>> --private-key=db:Open_vSwitch,SSL,private_key >>>>>>>> --certificate=db:Open_vSwitch,SSL,certificate >>>>>>>> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir >>>>>>>> --log-file=/var/log/openvswitch/ovsdb-server.log >>>>>>>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 28, 2017 at 5:00 AM, Kevin Benton <ke...@benton.pub> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Are you using an of_listen_address value of an interface being >>>>>>>>> brought down? >>>>>>>>> >>>>>>>>> On Apr 25, 2017 17:34, "Gustavo Randich" < >>>>>>>>> gustavo.rand...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> (using Mitaka / Ubuntu 16 / Neutron DVR / OVS / VXLAN / >>>>>>>>>> l2_population) >>>>>>>>>> >>>>>>>>>> This sounds very strange (to me): recently, after a switch >>>>>>>>>> outage, we lost connectivity to all our Mitaka hosts. We had to >>>>>>>>>> enter via >>>>>>>>>> iLO host by host and restart networking service to regain access. >>>>>>>>>> Then >>>>>>>>>> restart neutron-openvswitch-agent to regain access to VMs. >>>>>>>>>> >>>>>>>>>> At first glance we thought it was a problem with the NIC linux >>>>>>>>>> driver of the hosts not detecting link state correctly. >>>>>>>>>> >>>>>>>>>> Then we reproduced the issue simply bringing down physical >>>>>>>>>> interfaces for around 5 minutes, then up again. Same issue. >>>>>>>>>> >>>>>>>>>> And then.... we found that if instead of using native (ryu) >>>>>>>>>> OpenFlow interface in Neutron Openvswitch we used ovs-ofctl, the >>>>>>>>>> problem >>>>>>>>>> disappears. >>>>>>>>>> >>>>>>>>>> Any clue? >>>>>>>>>> >>>>>>>>>> Thanks in advance. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Mailing list: http://lists.openstack.org/cgi >>>>>>>>>> -bin/mailman/listinfo/openstack >>>>>>>>>> Post to : openst...@lists.openstack.org >>>>>>>>>> Unsubscribe : http://lists.openstack.org/cgi >>>>>>>>>> -bin/mailman/listinfo/openstack >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators