On 19 December 2017 at 22:23, Brian Haley <haleyb....@gmail.com> wrote:
> On 12/19/2017 04:00 PM, Ben Nemec wrote: > >> >> >> On 12/19/2017 02:43 PM, Brian Haley wrote: >> >>> On 12/19/2017 11:53 AM, Ben Nemec wrote: >>> >>>> The reboot is done (mostly...see below). >>>> >>>> On 12/18/2017 05:11 PM, Joe Talerico wrote: >>>> >>>>> Ben - Can you provide some links to the ovs port exhaustion issue for >>>>> some background? >>>>> >>>> >>>> I don't know if we ever had a bug opened, but there's some discussion >>>> of it in http://lists.openstack.org/pipermail/openstack-dev/2016-Dece >>>> mber/109182.html I've also copied Derek since I believe he was the >>>> one who found it originally. >>>> >>>> The gist is that after about 3 months of tripleo-ci running in this >>>> cloud we start to hit errors creating instances because of problems >>>> creating OVS ports on the compute nodes. Sometimes we see a huge number of >>>> ports in general, other times we see a lot of ports that look like this: >>>> >>>> Port "qvod2cade14-7c" >>>> tag: 4095 >>>> Interface "qvod2cade14-7c" >>>> >>>> Notably they all have a tag of 4095, which seems suspicious to me. I >>>> don't know whether it's actually an issue though. >>>> >>> >>> Tag 4095 is for "dead" OVS ports, it's an unused VLAN tag in the agent. >>> >>> The 'qvo' here shows it's part of the VETH pair that os-vif created when >>> it plugged in the VM (the other half is 'qvb'), and they're created so that >>> iptables rules can be applied by neutron. It's part of the "old" way to do >>> security groups with the OVSHybridIptablesFirewallDriver, and can >>> eventually go away once the OVSFirewallDriver can be used everywhere >>> (requires newer OVS and agent). >>> >>> I wonder if you can run the ovs_cleanup utility to clean some of these >>> up? >>> >> >> As in neutron-ovs-cleanup? Doesn't that wipe out everything, including >> any ports that are still in use? Or is there a different tool I'm not >> aware of that can do more targeted cleanup? >> > > Crap, I thought there was an option to just cleanup these dead devices, I > should have read the code, it's either neutron ports (default) or all > ports. Maybe that should be an option. iirc neutron-ovs-cleanup was being run following the reboot as part of a ExecStartPre= on one of the neutron services this is what essentially removed the ports for us. > > > -Brian > > > Oh, also worth noting that I don't think we have os-vif in this cloud >> because it's so old. There's no os-vif package installed anyway. >> >> >>> -Brian >>> >>> I've had some offline discussions about getting someone on this cloud to >>>> debug the problem. Originally we decided not to pursue it since it's not >>>> hard to work around and we didn't want to disrupt the environment by trying >>>> to move to later OpenStack code (we're still back on Mitaka), but it was >>>> pointed out to me this time around that from a downstream perspective we >>>> have users on older code as well and it may be worth debugging to make sure >>>> they don't hit similar problems. >>>> >>>> To that end, I've left one compute node un-rebooted for debugging >>>> purposes. The downstream discussion is ongoing, but I'll update here if we >>>> find anything. >>>> >>>> >>>>> Thanks, >>>>> Joe >>>>> >>>>> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openst...@nemebean.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> It's that magical time again. You know the one, when we reboot rh1 >>>>>> to avoid >>>>>> OVS port exhaustion. :-) >>>>>> >>>>>> If all goes well you won't even notice that this is happening, but >>>>>> there is >>>>>> the possibility that a few jobs will fail while the te-broker host is >>>>>> rebooted so I wanted to let everyone know. If you notice anything >>>>>> else >>>>>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know. >>>>>> I have >>>>>> been known to forget to restart services after the reboot. >>>>>> >>>>>> I'll send a followup when I'm done. >>>>>> >>>>>> -Ben >>>>>> >>>>>> __________________________________________________________________________ >>>>>> >>>>>> OpenStack Development Mailing List (not for usage questions) >>>>>> Unsubscribe: openstack-dev-requ...@lists.op >>>>>> enstack.org?subject:unsubscribe >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>> >>>>> >>>>> __________________________________________________________________________ >>>>> >>>>> OpenStack Development Mailing List (not for usage questions) >>>>> Unsubscribe: openstack-dev-requ...@lists.op >>>>> enstack.org?subject:unsubscribe >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>>> >>>> __________________________________________________________________________ >>>> >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: openstack-dev-requ...@lists.op >>>> enstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>> >>> >>> __________________________________________________________________________ >>> >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: openstack-dev-requ...@lists.op >>> enstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev