Hi Team, We are facing a recurring issue on our hypervisors, where we observe frequent disconnects triggered by inactivity probes across multiple nodes, impacting our stability. Below are the relevant details from the logs:
*OVS OVN Logs on the hypervisor* 2024-09-23T16:57:17.108Z|00549|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connected 2024-09-23T16:57:20.384Z|00550|jsonrpc|DBG|tcp:10.193.1.9:16642: received request, method="echo", params=[], id="echo" 2024-09-23T16:57:20.384Z|00551|jsonrpc|DBG|tcp:10.193.1.9:16642: send reply, result=[], id="echo" 2024-09-23T16:57:25.420Z|00552|jsonrpc|DBG|tcp:10.193.1.9:16642: received request, method="echo", params=[], id="echo" 2024-09-23T16:57:25.420Z|00553|jsonrpc|DBG|tcp:10.193.1.9:16642: send reply, result=[], id="echo" 2024-09-23T16:57:27.133Z|00554|rconn|ERR|unix:/run/openvswitch/br-int.mgmt: no response to inactivity probe after 5 seconds, disconnecting 2024-09-23T16:57:30.456Z|00555|jsonrpc|DBG|tcp:10.193.1.9:16642: received request, method="echo", params=[], id="echo" 2024-09-23T16:57:30.456Z|00556|jsonrpc|DBG|tcp:10.193.1.9:16642: send reply, result=[], id="echo" 2024-09-23T16:57:30.456Z|00557|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-09-23T16:57:35.492Z|00558|jsonrpc|DBG|tcp:10.193.1.9:16642: received request, method="echo", params=[], id="echo" 2024-09-23T16:57:35.492Z|00559|jsonrpc|DBG|tcp:10.193.1.9:16642: send reply, result=[], id="echo" 2024-09-23T16:57:35.493Z|00560|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connection timed out 2024-09-23T16:57:35.493Z|00561|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: waiting 2 seconds before reconnect 2024-09-23T16:57:36.223Z|00562|jsonrpc|DBG|tcp:10.193.1.9:16642: received notification, method="update3", params=[["monid","OVN_Southbound"],"e398ff98-1f9c-499e-8410-314c9738fd87",{"MAC_Binding":{"a8175ad1-f93c-45bd-a352-ff54cb75e0ca":{"modify":{"timestamp":1727110656179}},"dc4c13d3-878b-4c12-ad5c-f3fad5c0c9f3":{"modify":{"timestamp":1727110656179}}}}] 2024-09-23T16:57:40.529Z|00563|jsonrpc|DBG|tcp:10.193.1.9:16642: received request, method="echo", params=[], id="echo" 2024-09-23T16:57:40.529Z|00564|jsonrpc|DBG|tcp:10.193.1.9:16642: send reply, result=[], id="echo" 2024-09-23T16:57:40.530Z|00565|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-09-23T16:57:42.996Z|00566|jsonrpc|DBG|tcp:10.193.1.9:16642: received notification, method="update3", params=[["monid","OVN_Southbound"],"f1a47c07-1ee2-4565-8565-0a948da77f57",{"MAC_Binding":{"42f109da-3c9b-4c54-8e6b-2c47cf1ea8c1":{"modify":{"mac":"2a:e9:ed:2d:4c:57","timestamp":1727110662940}}}}] 2024-09-23T16:57:42.997Z|00567|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connection timed out 2024-09-23T16:57:42.997Z|00568|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: waiting 4 seconds before reconnect 2024-09-23T16:57:43.036Z|00569|jsonrpc|DBG|tcp:10.193.1.9:16642: received notification, method="update3", params=[["monid","OVN_Southbound"],"85c32eed-cf9c-474f-b665-7a8924340bb9",{"MAC_Binding":{"42f109da-3c9b-4c54-8e6b-2c47cf1ea8c1":{"modify":{"mac":"2a:03:d6:55:12:de","timestamp":1727110662986}}}}] 2024-09-23T16:57:45.565Z|00570|jsonrpc|DBG|tcp:10.193.1.9:16642: received request, method="echo", params=[], id="echo" 2024-09-23T16:57:45.565Z|00571|jsonrpc|DBG|tcp:10.193.1.9:16642: send reply, result=[], id="echo" 2024-09-23T16:57:50.602Z|00572|jsonrpc|DBG|tcp:10.193.1.9:16642: received request, method="echo", params=[], id="echo" 2024-09-23T16:57:50.602Z|00573|jsonrpc|DBG|tcp:10.193.1.9:16642: send reply, result=[], id="echo" 2024-09-23T16:57:50.602Z|00574|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting... 2024-09-23T16:57:54.638Z|00575|jsonrpc|DBG|tcp:10.193.1.9:16642: received notification, method="update3", params=[["monid","OVN_Southbound"],"557d1392-7456-428b-9c84-bbd14c29a007",{"MAC_Binding":{"597e8e46-fdc6-4144-8052-600cc7eaa5e7":{"modify":{"timestamp":1727110674593}},"54d847d2-57d1-480e-9efb-f02a79f60ebe":{"modify":{"timestamp":1727110674593}},"6229f448-8155-4fb6-bb51-701b0799a82f":{"modify":{"timestamp":1727110674593}},"cf03fc3d-1187-4f33-bb8f-ed974e579754":{"modify":{"timestamp":1727110674593}},"b52e4f2d-0a57-404e-8b82-d941cd005e3e":{"modify":{"timestamp":1727110674593}},"1e7afc4d-57dc-4041-a544-ab780ddaeb17":{"modify":{"timestamp":1727110674593}}}}] 2024-09-23T16:57:54.640Z|00576|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connection timed out *OVN central ovsdb-server logs* [root@ovnkube-db-ssd-0 ~]# cat /var/log/openvswitch/ovsdb-server-sb.log | grep 16:57 2024-09-04T20:16:57.850Z|02103|poll_loop|INFO|Dropped 13643 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-09-04T20:16:57.850Z|02104|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:199 (83% CPU usage) 2024-09-10T12:16:57.968Z|03881|poll_loop|INFO|Dropped 6939 log messages in last 35132 seconds (most recently, 35127 seconds ago) due to excessive rate 2024-09-10T12:16:57.968Z|03882|poll_loop|INFO|wakeup due to [POLLIN] on fd 45 (10.193.215.12:6644<->10.193.221.11:33656) at ../lib/stream-fd.c:157 (62% CPU usage) 2024-09-10T12:16:57.968Z|03883|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/jsonrpc-server.c:617 (62% CPU usage) 2024-09-10T12:16:57.969Z|03884|poll_loop|INFO|wakeup due to [POLLIN] on fd 45 (10.193.215.12:6644<->10.193.221.11:33656) at ../lib/stream-fd.c:157 (62% CPU usage) 2024-09-10T12:16:57.970Z|03885|poll_loop|INFO|wakeup due to [POLLIN] on fd 45 (10.193.215.12:6644<->10.193.221.11:33656) at ../lib/stream-fd.c:157 (62% CPU usage) 2024-09-10T12:16:57.970Z|03886|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:199 (62% CPU usage) 2024-09-10T12:16:57.970Z|03887|poll_loop|INFO|wakeup due to [POLLIN] on fd 45 (10.193.215.12:6644<->10.193.221.11:33656) at ../lib/stream-fd.c:157 (62% CPU usage) 2024-09-10T12:16:57.971Z|03888|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/jsonrpc-server.c:617 (62% CPU usage) 2024-09-10T12:16:57.971Z|03889|poll_loop|INFO|wakeup due to [POLLIN] on fd 27 (FIFO pipe:[136995519]) at ../ovsdb/log.c:1021 (62% CPU usage) 2024-09-10T12:16:57.972Z|03890|poll_loop|INFO|wakeup due to [POLLIN] on fd 45 (10.193.215.12:6644<->10.193.221.11:33656) at ../lib/stream-fd.c:157 (62% CPU usage) 2024-09-10T12:16:57.972Z|03891|poll_loop|INFO|wakeup due to 0-ms timeout at ../ovsdb/jsonrpc-server.c:617 (62% CPU usage) [root@ovnkube-db-ssd-0 ~]# *Relay DB logs* [root@ovsdb-relay-558444ff7d-56wlt ~]# grep 16:5 /var/log/ovn/ovsdb-server-sb-relay.log 2024-09-20T16:52:21.973Z|00891|reconnect|ERR|tcp:10.193.0.79:44880: no response to inactivity probe after 5 seconds, disconnecting 2024-09-20T16:54:57.750Z|00892|reconnect|ERR|tcp:10.193.0.91:36096: no response to inactivity probe after 5 seconds, disconnecting 2024-09-20T16:56:46.048Z|00893|reconnect|ERR|tcp:10.193.0.53:47436: no response to inactivity probe after 5 seconds, disconnecting 2024-09-20T17:02:20.477Z|00895|reconnect|ERR|tcp:10.193.0.16:58148: no response to inactivity probe after 5 seconds, disconnecting 2024-09-23T16:52:50.330Z|01402|reconnect|ERR|tcp:10.193.0.123:50522: no response to inactivity probe after 5 seconds, disconnecting 2024-09-23T16:53:05.538Z|01403|reconnect|ERR|tcp:10.193.0.92:42766: no response to inactivity probe after 5 seconds, disconnecting [root@ovsdb-relay-558444ff7d-56wlt ~]#
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss