On Fri, 2021-11-26 at 10:39 +0900, 劼磊周 wrote: > > hi > I have a headache problem with ovn. > In my environment we use a openstack(Victory). My problem is we have > a Intermittent packet loss problem in our system. It is not happen very > time. But maybe have relationship with VM cpu usage. There are some > logs with problem happen. > And we have big problem is when server restarted. The ovs process usage > max of cpu (in our server it use 5000% cpu display on top command). And > it will send huge strangeness packet to gateway l3 switch (It also case > 100% cpu usage on our gateway l3 switch). To resolve this problem only > to delete ovsdb and init it. > our ovs version is > ovs-vsctl (Open vSwitch) 2.13.3 > DB Schema 8.2.0 > ovn version is > ovn-controller 20.03.2 > Open vSwitch Library 2.13.3 > OpenFlow versions 0x4:0x4
ovn-controller taking a lot of CPU consistently has quite a few separate causes, many of which (or perhaps all?) have been addressed by the OVN 20.09 release. I think you'd have to enable ovn-controller debug logging (vlog/set dbg) to see what's going on. But some causes are: - tons of flows to install to OVS; 20.06 and 20.09 significantly reduce the number of OpenFlow flows that get installed, especially in Load Balancer situations and with distributed gateway/router ports - check for stale OVS interfaces in the OVS DB; if something forgot to clean up interfaces, ovs-vswitchd and the kernel take a lot of time to process them - is ovn-controller processing lots of upcalls for DHCP/ICMP/BFD/ARP/ND or other kinds of packets? OVN 20.09 implemented "Control Plane Protection" that rate-limits incoming packets to avoid DoS-ing ovn- controller - there have been a ton of OVS IDL optimizations that OVN itself uses when talking to the OVN NB and SB databases that reduce CPU usage; these landed in 20.09 and later. Those are just some ideas. I also put together a list of all the performance and scale improvements here... perhaps some of these address your issue. https://docs.google.com/document/d/1c5eQM4rVTLjns6smkvD1-hpbbNPx5-jJTo8rD3Zz7G8 Hope this helps, Dan > > > less /var/log/openvswitch/ovs-vswitchd.log > 2021-11-22T03:16:08.912Z|03437|rconn|ERR|br-int<->unix#24: no response > to inactivity probe after 60 seconds, disconnecting > > less /var/log/ovn/ovn-controller.log > 2021-11-22T03:16:10.617Z|16648|rconn|WARN|unix:/var/run/openvswitch/br- > int.mgmt: connection dropped (Broken pipe) > 2021-11-22T03:16:10.624Z|16649|timeval|WARN|Unreasonably long 62149ms > poll interval (61561ms user, 584ms system) > less /var/log/ovn/ovn-northd.log > 2021-11-22T00:02:24.010Z|607695|poll_loop|INFO|wakeup due to [POLLIN] > on fd 13 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (101% > CPU usage) > 2021-11-22T00:02:24.010Z|607696|poll_loop|INFO|wakeup due to [POLLIN] > on fd 12 (<->/var/run/ovn/ovnnb_db.sock) at lib/stream-fd.c:157 (101% > CPU usage) > 2021-11-22T00:02:24.023Z|607697|poll_loop|INFO|wakeup due to 0-ms > timeout at unix:/var/run/ovn/ovnsb_db.sock (101% CPU usage) > 2021-11-22T00:02:24.025Z|607698|poll_loop|INFO|wakeup due to 0-ms > timeout at unix:/var/run/ovn/ovnsb_db.sock (101% CPU usage) > 2021-11-22T00:02:24.027Z|607699|poll_loop|INFO|wakeup due to 0-ms > timeout at unix:/var/run/ovn/ovnsb_db.sock (101% CPU usage) > 2021-11-22T00:02:24.052Z|607700|poll_loop|INFO|wakeup due to [POLLIN] > on fd 13 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (101% > CPU usage) > 2021-11-22T00:02:24.053Z|607701|poll_loop|INFO|wakeup due to [POLLIN] > on fd 13 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (101% > CPU usage) > less /var/log/ovn/ovn-controller.log > > 2021-11-22T03:13:08.907Z|16638|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:08.907Z|16639|poll_loop|INFO|wakeup due to [POLLIN] on > fd 3 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% > CPU usage) > 2021-11-22T03:13:08.917Z|16640|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:08.950Z|16641|poll_loop|INFO|wakeup due to 0-ms > timeout at lib/ovsdb-idl.c:5552 (100% CPU usage) > 2021-11-22T03:13:08.958Z|16642|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:09.247Z|16643|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:09.650Z|16644|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:10.389Z|02604|pinctrl(ovn_pinctrl0)|INFO|DHCPACK > 02:16:3e:22:d9:8a 203.189.97.246 > 2021-11-22T03:13:10.462Z|16645|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:10.805Z|16646|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:11.798Z|16647|poll_loop|INFO|wakeup due to [POLLIN] on > fd 22 (172.29.221.2:32802<->172.29.220.117:6642) at lib/stream-fd.c:157 > (100% CPU usage) > 2021-11-22T03:13:55.101Z|02605|pinctrl(ovn_pinctrl0)|INFO|DHCPACK > 02:16:3e:11:df:c9 157.7.140.85 > 2021-11-22T03:15:37.115Z|02606|pinctrl(ovn_pinctrl0)|INFO|DHCPACK > 02:16:3e:57:1e:d3 157.7.143.202 > 2021-11-22T03:16:10.617Z|16648|rconn|WARN|unix:/var/run/openvswitch/br- > int.mgmt: connection dropped (Broken pipe) > 2021-11-22T03:16:10.624Z|16649|timeval|WARN|Unreasonably long 62149ms > poll interval (61561ms user, 584ms system) > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
