It looks more like a configuration or scaling problem on your environment and not a bug in OVN controller. If it's disconnected then it can't process new flows.
Question: 1. ovsdb disconnecting after 5s, how i can fix it? yes, by default. you can increase it by running 'ovs-vsctl set open . external_ids:ovn-remote-probe-interval="60000"' 2. ovn-controller response OVNSB commit failed, force recompute next time. What is it? It means ovn-controller attempted a write to the OVN DB but failed. It will recalculate all openflows on the next iteration. 3. ovn-controller log 100% CPU Usage. Looks like it only uses 1 cpu for processing. Maybe the this cpu of compute is being used by vm. How i can fix it? Is it possible for it to use more cpu? No, ovn-controller is single process, single threaded. 4. Why live-migrate can't down ovn-controller? Is it a flow conflict? Is there any way to deal with this situation? It looks like the whole environment is unstable, you should check southbound database if it's not overloaded. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1969354 Title: ovn-controller don't update new flows Status in neutron: Invalid Bug description: Problem: OVN-Controller down, can't update new flows Description: - Running OpenStack Victoria, build by kolla-ansible - E.g we have 2 compute (A & B). ~ 70vm, > 80k flows per compute - I live migrate some vm from compute A to compute B - openstack network agent list tells ovn-controller on compute B down - On ovn-nb-db, i check `ovn-nbctl list NB_Global`. It response hv_cfg < nb_cfg - On ovn-sb-db, i check `ovn-sbctl list Chassis`. It response nb_cfg of compute A = nb_cfg of ovn_nb. But nb_cfg of compute B < nb_cfg of ovn_nb ==> Neutron response OVN-controller on compute B down Work arround: - Flush all flows - Restart ovn-controller Reproduce: - Can't reproduce with live migrate Logs: 1. ovn-controller on compute B: 2022-04-15T09:53:22.156Z|12616545|timeval|WARN|Unreasonably long 9077ms poll interval (8944ms user, 131ms system) 2022-04-15T09:53:22.156Z|12616546|timeval|WARN|faults: 69006 minor, 0 major 2022-04-15T09:53:22.156Z|12616547|timeval|WARN|disk: 0 reads, 8 writes 2022-04-15T09:53:22.156Z|12616548|timeval|WARN|context switches: 0 voluntary, 19 involuntary 2022-04-15T09:53:22.156Z|12616549|poll_loop|INFO|Dropped 280 log messages in last 10 seconds (most recently, 9 seconds ago) due to excessive rate 2022-04-15T09:53:22.156Z|12616550|poll_loop|INFO|wakeup due to [POLLIN] on fd 16 (127.0.0.1:30242<->127.0.0.1:6640) at lib/stream-fd.c:157 (100% CPU usage) 2022-04-15T09:53:22.156Z|12616551|jsonrpc|WARN|tcp:127.0.0.1:6640: send error: Broken pipe 2022-04-15T09:53:22.156Z|12616552|jsonrpc|WARN|tcp:172.19.20.11:6642: send error: Broken pipe 2022-04-15T09:53:22.160Z|12616553|reconnect|WARN|tcp:127.0.0.1:6640: connection dropped (Broken pipe) 2022-04-15T09:53:22.160Z|12616554|reconnect|WARN|tcp:172.19.20.11:6642: connection dropped (Broken pipe) 2022-04-15T09:53:23.160Z|12616555|reconnect|INFO|tcp:127.0.0.1:6640: connecting... 2022-04-15T09:53:23.160Z|12616556|reconnect|INFO|tcp:172.19.20.10:6642: connecting... 2022-04-15T09:53:23.163Z|12616557|reconnect|INFO|tcp:127.0.0.1:6640: connected 2022-04-15T09:53:23.163Z|12616558|reconnect|INFO|tcp:172.19.20.10:6642: connected 2022-04-15T09:53:23.165Z|12616559|main|INFO|OVNSB commit failed, force recompute next time. 2022-04-15T09:53:31.991Z|12616569|timeval|WARN|Unreasonably long 8805ms poll interval (8708ms user, 94ms system) 2022-04-15T09:53:31.991Z|12616570|timeval|WARN|faults: 69251 minor, 0 major 2022-04-15T09:53:31.991Z|12616571|timeval|WARN|context switches: 0 voluntary, 19 involuntary 2022-04-15T09:53:31.991Z|12616572|poll_loop|INFO|Dropped 406 log messages in last 10 seconds (most recently, 9 seconds ago) due to excessive rate 2022-04-15T09:53:31.991Z|12616573|poll_loop|INFO|wakeup due to [POLLIN] on fd 16 (127.0.0.1:30540<->127.0.0.1:6640) at lib/stream-fd.c:157 (100% CPU usage) 2022-04-15T09:53:31.991Z|12616574|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 (172.19.20.100:8744<->172.19.20.10:6642) at lib/stream-fd.c:157 (100% CPU usage) 2. ovsdb-server on compute B: 2022-04-15T09:15:50.489Z|79819|jsonrpc|WARN|tcp:127.0.0.1:31970: receive error: Connection reset by peer 2022-04-15T09:15:50.489Z|79820|reconnect|WARN|tcp:127.0.0.1:31970: connection dropped (Connection reset by peer) 2022-04-15T09:17:50.763Z|79821|reconnect|ERR|tcp:127.0.0.1:32952: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:19:19.271Z|79822|reconnect|ERR|tcp:127.0.0.1:34554: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:19:50.806Z|79823|reconnect|ERR|tcp:127.0.0.1:35858: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:20:49.662Z|79824|reconnect|ERR|tcp:127.0.0.1:36300: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:21:21.746Z|79825|reconnect|ERR|tcp:127.0.0.1:36722: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:22:01.375Z|79826|reconnect|ERR|tcp:127.0.0.1:36914: no response to inactivity probe after 5.01 seconds, disconnecting 2022-04-15T09:22:57.564Z|79827|reconnect|ERR|tcp:127.0.0.1:37582: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:23:44.051Z|79828|reconnect|ERR|tcp:127.0.0.1:38234: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:24:27.935Z|79829|reconnect|ERR|tcp:127.0.0.1:39244: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:25:03.414Z|79830|reconnect|ERR|tcp:127.0.0.1:39484: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:25:58.860Z|79831|reconnect|ERR|tcp:127.0.0.1:39700: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:27:34.469Z|79832|reconnect|ERR|tcp:127.0.0.1:40224: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:29:07.546Z|79833|reconnect|ERR|tcp:127.0.0.1:41346: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:29:52.822Z|79834|reconnect|ERR|tcp:127.0.0.1:41930: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:30:40.555Z|79835|reconnect|ERR|tcp:127.0.0.1:42470: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:31:19.660Z|79836|reconnect|ERR|tcp:127.0.0.1:42872: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:32:07.061Z|79837|reconnect|ERR|tcp:127.0.0.1:43214: no response to inactivity probe after 5 seconds, disconnecting 3. ovn-sb-db on controller openstack: 2022-04-15T09:55:37.516Z|322313|jsonrpc|WARN|tcp:172.19.20.12:63278: receive error: Connection reset by peer 2022-04-15T09:55:37.516Z|322314|reconnect|WARN|tcp:172.19.20.12:63278: connection dropped (Connection reset by peer) 2022-04-15T09:55:46.857Z|322315|reconnect|ERR|tcp:172.19.20.11:35978: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:55:46.974Z|322316|reconnect|ERR|tcp:172.19.20.12:22952: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:55:49.367Z|322317|reconnect|ERR|tcp:172.19.20.101:12810: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:55:50.302Z|322318|reconnect|ERR|tcp:172.19.20.31:59604: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:55:55.287Z|322319|reconnect|ERR|tcp:172.19.20.102:61964: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:56:30.760Z|322320|reconnect|ERR|tcp:172.19.20.102:62190: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:56:48.494Z|322321|reconnect|ERR|tcp:172.19.20.105:53684: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:56:48.765Z|322322|reconnect|ERR|tcp:172.19.20.10:22338: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:57:05.311Z|322323|reconnect|ERR|tcp:172.19.20.102:62422: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:57:13.860Z|322324|reconnect|ERR|tcp:172.19.20.30:38406: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:57:39.726Z|322325|reconnect|ERR|tcp:172.19.20.102:62642: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:58:12.186Z|322326|reconnect|ERR|tcp:172.19.20.12:32468: no response to inactivity probe after 5 seconds, disconnecting 2022-04-15T09:58:14.320Z|322327|reconnect|ERR|tcp:172.19.20.102:62820: no response to inactivity probe after 5 seconds, disconnecting 4. ovn-sb-db on controller: 2022-04-15T01:28:50.733Z|22383|jsonrpc|WARN|tcp:172.19.20.12:25398: receive error: Connection reset by peer 2022-04-15T01:28:50.733Z|22384|reconnect|WARN|tcp:172.19.20.12:25398: connection dropped (Connection reset by peer) 2022-04-15T03:47:16.056Z|22385|jsonrpc|WARN|tcp:172.19.20.11:14554: receive error: Connection reset by peer 2022-04-15T03:47:16.056Z|22386|reconnect|WARN|tcp:172.19.20.11:14554: connection dropped (Connection reset by peer) 2022-04-15T03:47:31.910Z|22387|jsonrpc|WARN|tcp:172.19.20.11:15666: receive error: Connection reset by peer 2022-04-15T03:47:31.910Z|22388|reconnect|WARN|tcp:172.19.20.11:15666: connection dropped (Connection reset by peer) 2022-04-15T04:06:03.251Z|22389|jsonrpc|WARN|tcp:172.19.20.10:60386: receive error: Connection reset by peer 2022-04-15T04:06:03.251Z|22390|reconnect|WARN|tcp:172.19.20.10:60386: connection dropped (Connection reset by peer) 2022-04-15T05:54:59.704Z|22391|jsonrpc|WARN|tcp:172.19.20.12:59450: receive error: Connection reset by peer 2022-04-15T05:54:59.704Z|22392|reconnect|WARN|tcp:172.19.20.12:59450: connection dropped (Connection reset by peer) 2022-04-15T07:16:21.438Z|22393|jsonrpc|WARN|tcp:172.19.20.12:59508: receive error: Connection reset by peer 2022-04-15T07:16:21.438Z|22394|reconnect|WARN|tcp:172.19.20.12:59508: connection dropped (Connection reset by peer) 2022-04-15T07:16:31.218Z|22395|jsonrpc|WARN|tcp:172.19.20.10:43826: receive error: Connection reset by peer 2022-04-15T07:16:31.218Z|22396|reconnect|WARN|tcp:172.19.20.10:43826: connection dropped (Connection reset by peer) 2022-04-15T07:16:31.356Z|22397|jsonrpc|WARN|tcp:172.19.20.11:50460: receive error: Connection reset by peer 2022-04-15T07:16:31.356Z|22398|reconnect|WARN|tcp:172.19.20.11:50460: connection dropped (Connection reset by peer) 2022-04-15T08:20:10.313Z|22399|jsonrpc|WARN|tcp:172.19.20.10:25824: receive error: Connection reset by peer 2022-04-15T08:20:10.313Z|22400|reconnect|WARN|tcp:172.19.20.10:25824: connection dropped (Connection reset by peer) 2022-04-15T08:20:19.600Z|22401|jsonrpc|WARN|tcp:172.19.20.12:28378: receive error: Connection reset by peer 2022-04-15T08:20:19.600Z|22402|reconnect|WARN|tcp:172.19.20.12:28378: connection dropped (Connection reset by peer) 2022-04-15T08:52:14.280Z|22403|jsonrpc|WARN|tcp:172.19.20.11:19300: receive error: Connection reset by peer 2022-04-15T08:52:14.280Z|22404|reconnect|WARN|tcp:172.19.20.11:19300: connection dropped (Connection reset by peer) Question: 1. ovsdb disconnecting after 5s, how i can fix it? 2. ovn-controller response OVNSB commit failed, force recompute next time. What is it? 3. ovn-controller log 100% CPU Usage. Looks like it only uses 1 cpu for processing. Maybe the this cpu of compute is being used by vm. How i can fix it? Is it possible for it to use more cpu? 4. Why live-migrate can't down ovn-controller? Is it a flow conflict? Is there any way to deal with this situation? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1969354/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp