Thanks Hrushi. After further troubleshooting, we found that somehow openvswitch-agent was reading a file named "sensu" from /etc/sudoers.d directory and was failing in reading it, and we haven't configured anything like that in neutron configs. Removing that helped in bringing everything back to normal, but still its not clear why it was reading that file. We are trying to figure that out.
*Rahul Sharma* *MS in Computer Science, 2016* College of Computer and Information Science, Northeastern University Mobile: 801-706-7860 Email: rahulsharma...@gmail.com On Mon, Mar 14, 2016 at 9:10 PM, Gangur, Hrushikesh < hrushikesh.gan...@hpe.com> wrote: > Rahul – it seems your issue is similar to the one reported here, probably > due to hostname resolution issue. > > https://bugs.launchpad.net/charms/+source/quantum-gateway/+bug/1405588 > > > > Regards~hrushi > > > > *From:* Rahul Sharma [mailto:rahulsharma...@gmail.com > <rahulsharma...@gmail.com>] > *Sent:* Monday, March 14, 2016 3:32 PM > *To:* openstack <openstack@lists.openstack.org>; OpenStack Development > Mailing List <openstack-...@lists.openstack.org>; > openstack-operat...@lists.openstack.org > *Subject:* [Openstack-operators] [neutron] openvswitch-agent spins up too > many /bin/ovsdb-client processes > > > > Hi All, > > > > We are trying to debug an issue with our production environment. We are > seeing neutron-openvswitch-agent starts failing after some time (1-2 days). > After debugging, we found that there are large number of entries for the > ovsdb-client. On some nodes, it crosses more than 330 processes and then > ovsdb process starts failing. > > 1. root 30689 1 0 00:37 ? 00:00:00 /bin/ovsdb-client > monitor Interface name,ofport --format=json > > 2. root 30804 1 0 00:38 ? 00:00:00 /bin/ovsdb-client > monitor Interface name,ofport --format=json > > 3. root 30909 1 0 00:38 ? 00:00:00 /bin/ovsdb-client > monitor Interface name,ofport --format=json > > > > Pastebin link for the processes: http://pastebin.com/QGQC0Jrt > > Pastebin link with openvswitch starting all of them: > http://pastebin.com/repHMkHu > > > > In logs, we start getting errors as:- > > Mar 14 05:41:29 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating > with signal 14 (Alarm clock) > > Mar 14 05:41:39 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating > with signal 14 (Alarm clock) > > Mar 14 05:41:49 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating > with signal 14 (Alarm clock) > > Mar 14 05:49:30 node2 ovs-vsctl: > ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection > failed (Protocol error) > > Mar 14 05:49:32 node2 ovs-vsctl: > ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection > failed (Protocol error) > > Mar 14 05:49:34 node2 ovs-vsctl: > ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection > failed (Protocol error) > > > > Openvswitch version:- > > [root@node2 ~(openstack_admin)]# ovs-vsctl --version > > ovs-vsctl (Open vSwitch) 2.4.0 > > Compiled Sep 4 2015 09:49:34 > > DB Schema 7.12.1 > > > > We have to restart openvswitch service everytime and that clears up all > the processes. We are trying to figure out why so many processes are > getting started by neutron-agent? Also, we found that if we restart the > host's networking, one new process for the /bin/ovsdb-client starts. We > checked and found that we don't have any network fluctuations or any > nic-flappings. Are there any pointers where we should be looking into? It > occurs on both controller and compute nodes. > > > *Rahul Sharma* > *MS in Computer Science, 2016* > College of Computer and Information Science, Northeastern University > Mobile: 801-706-7860 > Email: rahulsharma...@gmail.com >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack