On Tue, Dec 19, 2017 at 9:53 AM, Ben Nemec <openst...@nemebean.com> wrote: > The reboot is done (mostly...see below). > > On 12/18/2017 05:11 PM, Joe Talerico wrote: >> >> Ben - Can you provide some links to the ovs port exhaustion issue for >> some background? > > > I don't know if we ever had a bug opened, but there's some discussion of it > in > http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html > I've also copied Derek since I believe he was the one who found it > originally. > > The gist is that after about 3 months of tripleo-ci running in this cloud we > start to hit errors creating instances because of problems creating OVS > ports on the compute nodes. Sometimes we see a huge number of ports in > general, other times we see a lot of ports that look like this: > > Port "qvod2cade14-7c" > tag: 4095 > Interface "qvod2cade14-7c" > > Notably they all have a tag of 4095, which seems suspicious to me. I don't > know whether it's actually an issue though. > > I've had some offline discussions about getting someone on this cloud to > debug the problem. Originally we decided not to pursue it since it's not > hard to work around and we didn't want to disrupt the environment by trying > to move to later OpenStack code (we're still back on Mitaka), but it was > pointed out to me this time around that from a downstream perspective we > have users on older code as well and it may be worth debugging to make sure > they don't hit similar problems. > > To that end, I've left one compute node un-rebooted for debugging purposes. > The downstream discussion is ongoing, but I'll update here if we find > anything. >
I just so happened to wander across the bug from last time, https://bugs.launchpad.net/tripleo/+bug/1719334 > >> >> Thanks, >> Joe >> >> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openst...@nemebean.com> >> wrote: >>> >>> Hi, >>> >>> It's that magical time again. You know the one, when we reboot rh1 to >>> avoid >>> OVS port exhaustion. :-) >>> >>> If all goes well you won't even notice that this is happening, but there >>> is >>> the possibility that a few jobs will fail while the te-broker host is >>> rebooted so I wanted to let everyone know. If you notice anything else >>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know. I >>> have >>> been known to forget to restart services after the reboot. >>> >>> I'll send a followup when I'm done. >>> >>> -Ben >>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev