Hi! I work on the FRRouting project (https://frrouting/org ) and am doing work with FRR and have noticed that when I have a full BGP feed on a system that is also running ovs-vswitchd that ovs-vswitchd sits at 100% cpu:
top - 09:43:12 up 4 days, 22:53, 3 users, load average: 1.06, 1.08, 1.08 Tasks: 188 total, 3 running, 185 sleeping, 0 stopped, 0 zombie %Cpu(s): 12.3 us, 14.7 sy, 0.0 ni, 72.8 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 7859.3 total, 2756.5 free, 2467.2 used, 2635.6 buff/cache MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 5101.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 730 root 10 -10 146204 146048 11636 R 98.3 1.8 6998:13 ovs-vswitchd 169620 root 20 0 0 0 0 I 3.3 0.0 1:34.83 kworker/0:3-events 21 root 20 0 0 0 0 S 1.3 0.0 14:09.59 ksoftirqd/1 131734 frr 15 -5 2384292 609556 6612 S 1.0 7.6 21:57.51 zebra 131739 frr 15 -5 1301168 1.0g 7420 S 1.0 13.3 18:16.17 bgpd When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops running at 100%: top - 09:48:12 up 4 days, 22:58, 3 users, load average: 0.08, 0.60, 0.89 Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.4 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 7859.3 total, 4560.6 free, 663.1 used, 2635.6 buff/cache MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 6906.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 179064 sharpd 20 0 11852 3816 3172 R 1.0 0.0 0:00.09 top 1037 zerotie+ 20 0 291852 113180 7408 S 0.7 1.4 19:09.17 zerotier-one 1043 Debian-+ 20 0 34356 21988 7588 S 0.3 0.3 22:04.42 snmpd 178480 root 20 0 0 0 0 I 0.3 0.0 0:01.21 kworker/1:2-events 178622 sharpd 20 0 14020 6364 4872 S 0.3 0.1 0:00.10 sshd 1 root 20 0 169872 13140 8272 S 0.0 0.2 2:33.26 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.60 kthreadd I do not have any particular ovs configuration on this box: sharpd@janelle:~$ sudo ovs-vsctl show c72d327c-61eb-4877-b4e7-dcf7e07e24fc ovs_version: "2.13.8" sharpd@janelle:~$ sudo ovs-vsctl list o . _uuid : c72d327c-61eb-4877-b4e7-dcf7e07e24fc bridges : [] cur_cfg : 0 datapath_types : [netdev, system] datapaths : {} db_version : "8.2.0" dpdk_initialized : false dpdk_version : none external_ids : {hostname=janelle, rundir="/var/run/openvswitch", system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"} iface_types : [erspan, geneve, gre, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan] manager_options : [] next_cfg : 0 other_config : {} ovs_version : "2.13.8" ssl : [] statistics : {} system_type : ubuntu system_version : "20.04" sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m ovs-vswitchd: no datapaths exist ovs-vswitchd: datapath not found (Invalid argument) ovs-appctl: ovs-vswitchd: server returned an error Eli Britstein suggested I update ovs-openvswitch to latest and I did and saw the same behavior. When I pulled up the running code in a debugger I see that ovs-vswitchd is running in this loop below pretty much 100% of the time: (gdb) f 4 #4 0x0000559498b4e476 in route_table_run () at lib/route-table.c:133 133 nln_run(nln); (gdb) l 128 OVS_EXCLUDED(route_table_mutex) 129 { 130 ovs_mutex_lock(&route_table_mutex); 131 if (nln) { 132 rtnetlink_run(); 133 nln_run(nln); 134 135 if (!route_table_valid) { 136 route_table_reset(); 137 } (gdb) l 138 } 139 ovs_mutex_unlock(&route_table_mutex); 140 } I pulled up where route_table_valid is set: 298 static void 299 route_table_change(const struct route_table_msg *change OVS_UNUSED, 300 void *aux OVS_UNUSED) 301 { 302 route_table_valid = false; 303 } If I am reading the code correctly, every RTM_NEWROUTE netlink message that ovs-vswitchd is getting is setting the route_table_valid global variable to false and causing route_table_reset() to be run. This makes sense in context of what FRR is doing. A full BGP feed *always* has churn. So ovs-vswitchd is receiving. RTM_NEWROUTE message, parsing it and deciding in route_table_change() that the route table is no longer valid and causing it to call route_table_reset() which redumps the entire routing table to ovs-vswitchd. In this case there are ~115k ipv6 routes in the linux fib. I hesitate to make any changes here since I really don't understand what the end goal here is. ovs-vswitchd is receiving a route change from the kernel but is in turn causing it to redump the entire routing table again. What should be the correct behavior be from ovs-vswitchd's perspective here? As a note, I recompiled and set line 302 to true above and cpu usage of ovs-vswitchd pretty much stays at 0% once the initial table read has been done. thanks! donald
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss