Hi Gavin, we saw similar issues after reaching a certain number of hypervisors. This happened because our ovsdb processes ran at 100% cpu utilization (and they are not multithreaded).
Our solutions where: 1. If you use ssl on your north-/southbound db. Disable it and add a tls terminating reverse proxy (like traefik) in front 2. Increase the inactivity probe significantly (you might need to change it on the ovn-controller and ovsdb side, not sure anymore) 3. Introduce ovsdb relays and connect the ovn-controllers there. -- Felix Huettner From: discuss <ovs-discuss-boun...@openvswitch.org> On Behalf Of Gavin McKee via discuss Sent: Monday, May 1, 2023 9:20 PM To: ovs-discuss <ovs-discuss@openvswitch.org> Subject: [ovs-discuss] CPU pinned at 100% , ovn-controller to ovnsb_db unstable Hi , I'm having a pretty bad issue with OVN controller on the hypervisors being unable to connect to the OVS SB DB , 2023-05-01T19:13:33.969Z|00541|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: no response to inactivity probe after 5 seconds, disconnecting 2023-05-01T19:13:33.969Z|00542|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connection dropped 2023-05-01T19:13:43.043Z|00543|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connected 2023-05-01T19:13:56.115Z|00544|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: no response to inactivity probe after 5 seconds, disconnecting 2023-05-01T19:13:56.115Z|00545|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connection dropped 2023-05-01T19:14:36.177Z|00546|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connected 2023-05-01T19:14:44.996Z|00547|jsonrpc|WARN|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: receive error: Connection reset by peer 2023-05-01T19:14:44.996Z|00548|reconnect|WARN|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connection dropped (Connection reset by peer) 2023-05-01T19:15:44.131Z|00549|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connected 2023-05-01T19:15:54.137Z|00550|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: no response to inactivity probe after 5 seconds, disconnecting 2023-05-01T19:15:54.137Z|00551|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connection dropped 2023-05-01T19:16:02.184Z|00552|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connected 2023-05-01T19:16:14.488Z|00553|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: no response to inactivity probe after 5 seconds, disconnecting 2023-05-01T19:16:14.488Z|00554|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>: connection dropped This happened after pushing a configuration to north db for around 250 logical switch ports. Once I turn on the VM's everything goes bad very quickly, 2023-05-01T04:27:09.294Z|01947|poll_loop|INFO|wakeup due to [POLLOUT] on fd 66 (10.193.200.6:6642<http://10.193.200.6:6642/><->10.193.0.102:48794<http://10.193.0.102:48794/>) at ../lib/stream-fd.c:153 (100% CPU usage) Can anyone provide any guidance how to run down an issue like this ? Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz/>.
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss