Hi Gavin,

we saw similar issues after reaching a certain number of hypervisors. This 
happened because our ovsdb processes ran at 100% cpu utilization (and they are 
not multithreaded).

Our solutions where:

  1.  If you use ssl on your north-/southbound db. Disable it and add a tls 
terminating reverse proxy (like traefik) in front
  2.  Increase the inactivity probe significantly (you might need to change it 
on the ovn-controller and ovsdb side, not sure anymore)
  3.  Introduce ovsdb relays and connect the ovn-controllers there.

--
Felix Huettner

From: discuss <ovs-discuss-boun...@openvswitch.org> On Behalf Of Gavin McKee 
via discuss
Sent: Monday, May 1, 2023 9:20 PM
To: ovs-discuss <ovs-discuss@openvswitch.org>
Subject: [ovs-discuss] CPU pinned at 100% , ovn-controller to ovnsb_db unstable

Hi ,

I'm having a pretty bad issue with OVN controller on the hypervisors being 
unable to connect to the OVS SB DB ,

2023-05-01T19:13:33.969Z|00541|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 no response to inactivity probe after 5 seconds, disconnecting
2023-05-01T19:13:33.969Z|00542|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connection dropped
2023-05-01T19:13:43.043Z|00543|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connected
2023-05-01T19:13:56.115Z|00544|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 no response to inactivity probe after 5 seconds, disconnecting
2023-05-01T19:13:56.115Z|00545|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connection dropped
2023-05-01T19:14:36.177Z|00546|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connected
2023-05-01T19:14:44.996Z|00547|jsonrpc|WARN|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 receive error: Connection reset by peer
2023-05-01T19:14:44.996Z|00548|reconnect|WARN|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connection dropped (Connection reset by peer)
2023-05-01T19:15:44.131Z|00549|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connected
2023-05-01T19:15:54.137Z|00550|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 no response to inactivity probe after 5 seconds, disconnecting
2023-05-01T19:15:54.137Z|00551|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connection dropped
2023-05-01T19:16:02.184Z|00552|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connected
2023-05-01T19:16:14.488Z|00553|reconnect|ERR|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 no response to inactivity probe after 5 seconds, disconnecting
2023-05-01T19:16:14.488Z|00554|reconnect|INFO|tcp:10.193.1.2:6642<http://10.193.1.2:6642/>:
 connection dropped

This happened after pushing a configuration to north db for around 250 logical 
switch ports.
Once I turn on the VM's everything goes bad very quickly,


2023-05-01T04:27:09.294Z|01947|poll_loop|INFO|wakeup due to [POLLOUT] on fd 66 
(10.193.200.6:6642<http://10.193.200.6:6642/><->10.193.0.102:48794<http://10.193.0.102:48794/>)
 at ../lib/stream-fd.c:153 (100% CPU usage)

Can anyone provide any guidance how to run down an issue like this ?

Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der 
vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in 
Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie 
hier<https://www.datenschutz.schwarz/>.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to