Hi All, We are trying to debug an issue with our production environment. We are seeing neutron-openvswitch-agent starts failing after some time (1-2 days). After debugging, we found that there are large number of entries for the ovsdb-client. On some nodes, it crosses more than 330 processes and then ovsdb process starts failing.
1. root 30689 1 0 00:37 ? 00:00:00 /bin/ovsdb-client monitor Interface name,ofport --format=json 2. root 30804 1 0 00:38 ? 00:00:00 /bin/ovsdb-client monitor Interface name,ofport --format=json 3. root 30909 1 0 00:38 ? 00:00:00 /bin/ovsdb-client monitor Interface name,ofport --format=json Pastebin link for the processes: http://pastebin.com/QGQC0Jrt Pastebin link with openvswitch starting all of them: http://pastebin.com/repHMkHu In logs, we start getting errors as:- Mar 14 05:41:29 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock) Mar 14 05:41:39 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock) Mar 14 05:41:49 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock) Mar 14 05:49:30 node2 ovs-vsctl: ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error) Mar 14 05:49:32 node2 ovs-vsctl: ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error) Mar 14 05:49:34 node2 ovs-vsctl: ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error) Openvswitch version:- [root@node2 ~(openstack_admin)]# ovs-vsctl --version ovs-vsctl (Open vSwitch) 2.4.0 Compiled Sep 4 2015 09:49:34 DB Schema 7.12.1 We have to restart openvswitch service everytime and that clears up all the processes. We are trying to figure out why so many processes are getting started by neutron-agent? Also, we found that if we restart the host's networking, one new process for the /bin/ovsdb-client starts. We checked and found that we don't have any network fluctuations or any nic-flappings. Are there any pointers where we should be looking into? It occurs on both controller and compute nodes. *Rahul Sharma* *MS in Computer Science, 2016* College of Computer and Information Science, Northeastern University Mobile: 801-706-7860 Email: rahulsharma...@gmail.com
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack