So, after some testing, we finally fixed our issue of lost connections to instances. The actual issue was that the ARP table on the network node was hitting its limit constantly and thus, discarding legitimate routes. This caused our connections to flap and the HA routers to switch node without warning. Increasing net.ipv4.neigh.default.gc_thresh1, net.ipv4.neigh.default.gc_thresh2 and net.ipv4.neigh.default.gc_thresh3 kernel values ended up fixing the issue.
Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 28 sept. 2018 à 10:53, Jean-Philippe Méthot <jp.met...@planethoster.info> > a écrit : > > Thank you, I will try it next week (since today is Friday) and update this > thread if it has fixed my issues. We are indeed using the latest RDO Pike, so > ovsdbapp 0.4.3.1 . > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > >> Le 28 sept. 2018 à 03:03, Slawomir Kaplonski <skapl...@redhat.com >> <mailto:skapl...@redhat.com>> a écrit : >> >> Hi, >> >> What version of Neutron and ovsdbapp You are using? IIRC there was such >> issue somewhere around Pike version, we saw it in functional tests quite >> often. But later with new ovsdbapp version I think that this problem was >> somehow solved. >> Maybe try newer version of ovsdbapp and check if it will be better. > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators