On 12/19/2017 04:00 PM, Ben Nemec wrote:
On 12/19/2017 02:43 PM, Brian Haley wrote:
On 12/19/2017 11:53 AM, Ben Nemec wrote:
The reboot is done (mostly...see below).
On 12/18/2017 05:11 PM, Joe Talerico wrote:
Ben - Can you provide some links to the ovs port exhaustion issue for
some background?
I don't know if we ever had a bug opened, but there's some discussion
of it in
http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html
I've also copied Derek since I believe he was the one who found it
originally.
The gist is that after about 3 months of tripleo-ci running in this
cloud we start to hit errors creating instances because of problems
creating OVS ports on the compute nodes. Sometimes we see a huge
number of ports in general, other times we see a lot of ports that
look like this:
Port "qvod2cade14-7c"
tag: 4095
Interface "qvod2cade14-7c"
Notably they all have a tag of 4095, which seems suspicious to me. I
don't know whether it's actually an issue though.
Tag 4095 is for "dead" OVS ports, it's an unused VLAN tag in the agent.
The 'qvo' here shows it's part of the VETH pair that os-vif created
when it plugged in the VM (the other half is 'qvb'), and they're
created so that iptables rules can be applied by neutron. It's part
of the "old" way to do security groups with the
OVSHybridIptablesFirewallDriver, and can eventually go away once the
OVSFirewallDriver can be used everywhere (requires newer OVS and agent).
I wonder if you can run the ovs_cleanup utility to clean some of these
up?
As in neutron-ovs-cleanup? Doesn't that wipe out everything, including
any ports that are still in use? Or is there a different tool I'm not
aware of that can do more targeted cleanup?
Crap, I thought there was an option to just cleanup these dead devices,
I should have read the code, it's either neutron ports (default) or all
ports. Maybe that should be an option.
-Brian
Oh, also worth noting that I don't think we have os-vif in this cloud
because it's so old. There's no os-vif package installed anyway.
-Brian
I've had some offline discussions about getting someone on this cloud
to debug the problem. Originally we decided not to pursue it since
it's not hard to work around and we didn't want to disrupt the
environment by trying to move to later OpenStack code (we're still
back on Mitaka), but it was pointed out to me this time around that
from a downstream perspective we have users on older code as well and
it may be worth debugging to make sure they don't hit similar problems.
To that end, I've left one compute node un-rebooted for debugging
purposes. The downstream discussion is ongoing, but I'll update here
if we find anything.
Thanks,
Joe
On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openst...@nemebean.com>
wrote:
Hi,
It's that magical time again. You know the one, when we reboot rh1
to avoid
OVS port exhaustion. :-)
If all goes well you won't even notice that this is happening, but
there is
the possibility that a few jobs will fail while the te-broker host is
rebooted so I wanted to let everyone know. If you notice anything
else
hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know.
I have
been known to forget to restart services after the reboot.
I'll send a followup when I'm done.
-Ben
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev