On 10/31/22 17:25, Donald Sharp via discuss wrote: > Hi! > > I work on the FRRouting project (https://frrouting/org > <https://frrouting/org> ) and am doing work with FRR and have noticed that > when I have a full BGP feed on a system that is also running ovs-vswitchd > that ovs-vswitchd sits at 100% cpu: > > top - 09:43:12 up 4 days, 22:53, 3 users, load average: 1.06, 1.08, 1.08 > Tasks: 188 total, 3 running, 185 sleeping, 0 stopped, 0 zombie > %Cpu(s): 12.3 us, 14.7 sy, 0.0 ni, 72.8 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 > st > MiB Mem : 7859.3 total, 2756.5 free, 2467.2 used, 2635.6 buff/cache > MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 5101.9 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > > 730 root 10 -10 146204 146048 11636 R 98.3 1.8 6998:13 > ovs-vswitchd > > 169620 root 20 0 0 0 0 I 3.3 0.0 1:34.83 > kworker/0:3-events > > 21 root 20 0 0 0 0 S 1.3 0.0 14:09.59 > ksoftirqd/1 > > 131734 frr 15 -5 2384292 609556 6612 S 1.0 7.6 21:57.51 zebra > > > 131739 frr 15 -5 1301168 1.0g 7420 S 1.0 13.3 18:16.17 bgpd > > > When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops running > at 100%: > > top - 09:48:12 up 4 days, 22:58, 3 users, load average: 0.08, 0.60, 0.89 > Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie > %Cpu(s): 0.2 us, 0.4 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 > st > MiB Mem : 7859.3 total, 4560.6 free, 663.1 used, 2635.6 buff/cache > MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 6906.1 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > > 179064 sharpd 20 0 11852 3816 3172 R 1.0 0.0 0:00.09 top > > > 1037 zerotie+ 20 0 291852 113180 7408 S 0.7 1.4 19:09.17 > zerotier-one > > 1043 Debian-+ 20 0 34356 21988 7588 S 0.3 0.3 22:04.42 snmpd > > > 178480 root 20 0 0 0 0 I 0.3 0.0 0:01.21 > kworker/1:2-events > > 178622 sharpd 20 0 14020 6364 4872 S 0.3 0.1 0:00.10 sshd > > > 1 root 20 0 169872 13140 8272 S 0.0 0.2 2:33.26 > systemd > > 2 root 20 0 0 0 0 S 0.0 0.0 0:00.60 > kthreadd > > I do not have any particular ovs configuration on this box: > sharpd@janelle:~$ sudo ovs-vsctl show > c72d327c-61eb-4877-b4e7-dcf7e07e24fc > ovs_version: "2.13.8" > > > sharpd@janelle:~$ sudo ovs-vsctl list o . > _uuid : c72d327c-61eb-4877-b4e7-dcf7e07e24fc > bridges : [] > cur_cfg : 0 > datapath_types : [netdev, system] > datapaths : {} > db_version : "8.2.0" > dpdk_initialized : false > dpdk_version : none > external_ids : {hostname=janelle, rundir="/var/run/openvswitch", > system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"} > iface_types : [erspan, geneve, gre, internal, ip6erspan, ip6gre, > lisp, patch, stt, system, tap, vxlan] > manager_options : [] > next_cfg : 0 > other_config : {} > ovs_version : "2.13.8" > ssl : [] > statistics : {} > system_type : ubuntu > system_version : "20.04" > > sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m > ovs-vswitchd: no datapaths exist > ovs-vswitchd: datapath not found (Invalid argument) > ovs-appctl: ovs-vswitchd: server returned an error > > Eli Britstein suggested I update ovs-openvswitch to latest and I did and saw > the same behavior. When I pulled up the running code in a debugger I see > that ovs-vswitchd is running in this loop below pretty much 100% of the time: > > (gdb) f 4 > #4 0x0000559498b4e476 in route_table_run () at lib/route-table.c:133 > 133 nln_run(nln); > (gdb) l > 128 OVS_EXCLUDED(route_table_mutex) > 129 { > 130 ovs_mutex_lock(&route_table_mutex); > 131 if (nln) { > 132 rtnetlink_run(); > 133 nln_run(nln); > 134 > 135 if (!route_table_valid) { > 136 route_table_reset(); > 137 } > (gdb) l > 138 } > 139 ovs_mutex_unlock(&route_table_mutex); > 140 } > > I pulled up where route_table_valid is set: > > 298 static void > 299 route_table_change(const struct route_table_msg *change > OVS_UNUSED, > 300 void *aux OVS_UNUSED) > 301 { > 302 route_table_valid = false; > 303 } > > > If I am reading the code correctly, every RTM_NEWROUTE netlink message that > ovs-vswitchd is getting > is setting the route_table_valid global variable to false and causing > route_table_reset() to be run. > This makes sense in context of what FRR is doing. A full BGP feed *always* > has churn. So ovs-vswitchd > is receiving. RTM_NEWROUTE message, parsing it and deciding in > route_table_change() that the > route table is no longer valid and causing it to call route_table_reset() > which redumps the entire > routing table to ovs-vswitchd. In this case there are ~115k ipv6 routes in > the linux fib. > > I hesitate to make any changes here since I really don't understand what the > end goal here is. > ovs-vswitchd is receiving a route change from the kernel but is in turn > causing it to redump the entire > routing table again. What should be the correct behavior be from > ovs-vswitchd's perspective here?
Hi, Donald. Your analysis is correct. OVS will invalidate the cached routing table and re-dump it in full on the next access on each netlink notification about route changes. Looking back into commit history, OVS did maintain the cache and only added/removed what was in the netlink message incrementally. But that changed in 2011 with the following commit: commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884 Author: Ethan J. Jackson <e...@eecs.berkeley.edu> Date: Thu Jan 13 16:29:31 2011 -0800 route-table: Handle route updates more robustly. The kernel does not broadcast rtnetlink route messages in all cases one would expect. This can cause stale entires to end up in the route table which may cause incorrect results for route_table_get_ifindex() queries. This commit causes rtnetlink route messages to dump the entire route table on the next route_table_get_ifindex() query. And indeed, looking at the history of attempts of different projects to use route notifications, they all are facing issues and it seems like none of them is actually able to fully correctly handle all the notifications, just because these notifications are notoriously bad. It seems to be impossible in certain cases to tell what exactly changed and how. There could be duplicates or missing notifications. And the code of projects that are trying to maintain a route cache in userspace is insanely complex and doesn't handle 100% of cases anyway. There were attempts to convince kernel developers to add unique identifiers to routes, so userspace can tell them apart, but all of them seems to die leaving the problem unresolved. These are some discussions/bugs that I found: https://bugzilla.redhat.com/show_bug.cgi?id=1337855 https://bugzilla.redhat.com/show_bug.cgi?id=1722728 https://github.com/thom311/libnl/issues/226 https://github.com/thom311/libnl/issues/224 None of the bugs seems to be resolved. Most are closed for non-technical reasons. I suppose, Ethan just decided to not deal with that horribly unreliable kernel interface and just re-dump the route table on changes. For your actual problem here, I'm not sure if we can fix it that easily. Is it necessary for OVS to know about these routes? If no, it might be possible to isolate them in a separate network namespace, so OVS will not receive all the route updates? Do you know how long it takes to dump a route table once? Maybe it worth limiting that process to only dump once a second or once in a few seconds. That should alleviate the load if the actual dump is relatively fast. Best regards, Ilya Maximets. _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss