>-----Original Message-----
>From: Ilya Maximets <i.maxim...@ovn.org>
>Sent: Monday, 31 October 2022 23:54
>To: Donald Sharp <donaldshar...@gmail.com>; ovs-
>disc...@openvswitch.org; e...@eecs.berkeley.edu; Eli Britstein
><el...@nvidia.com>
>Cc: i.maxim...@ovn.org
>Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
>
>External email: Use caution opening links or attachments
>
>
>On 10/31/22 17:25, Donald Sharp via discuss wrote:
>> Hi!
>>
>> I work on the FRRouting project (https://frrouting/org
><https://frrouting/org> ) and am doing work with FRR and have noticed that
>when I have a full BGP feed on a system that is also running ovs-vswitchd that
>ovs-vswitchd sits at 100% cpu:
>>
>> top - 09:43:12 up 4 days, 22:53, 3 users, load average: 1.06, 1.08, 1.08
>> Tasks: 188 total, 3 running, 185 sleeping, 0 stopped, 0 zombie
>> %Cpu(s): 12.3 us, 14.7 sy, 0.0 ni, 72.8 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0
>> st
>> MiB Mem : 7859.3 total, 2756.5 free, 2467.2 used, 2635.6 buff/cache
>> MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 5101.9 avail Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>> COMMAND
>> 730 root 10 -10 146204 146048 11636 R 98.3 1.8 6998:13
>> ovs-vswitchd
>> 169620 root 20 0 0 0 0 I 3.3 0.0 1:34.83
>> kworker/0:3-events
>> 21 root 20 0 0 0 0 S 1.3 0.0 14:09.59
>> ksoftirqd/1
>> 131734 frr 15 -5 2384292 609556 6612 S 1.0 7.6 21:57.51 zebra
>> 131739 frr 15 -5 1301168 1.0g 7420 S 1.0 13.3 18:16.17 bgpd
>>
>> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops running
>at 100%:
>>
>> top - 09:48:12 up 4 days, 22:58, 3 users, load average: 0.08, 0.60, 0.89
>> Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie
>> %Cpu(s): 0.2 us, 0.4 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0
>> st
>> MiB Mem : 7859.3 total, 4560.6 free, 663.1 used, 2635.6 buff/cache
>> MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 6906.1 avail Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>> COMMAND
>> 179064 sharpd 20 0 11852 3816 3172 R 1.0 0.0 0:00.09 top
>> 1037 zerotie+ 20 0 291852 113180 7408 S 0.7 1.4 19:09.17
>> zerotier-one
>> 1043 Debian-+ 20 0 34356 21988 7588 S 0.3 0.3 22:04.42 snmpd
>> 178480 root 20 0 0 0 0 I 0.3 0.0 0:01.21
>> kworker/1:2-events
>> 178622 sharpd 20 0 14020 6364 4872 S 0.3 0.1 0:00.10 sshd
>> 1 root 20 0 169872 13140 8272 S 0.0 0.2 2:33.26
>> systemd
>> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.60
>> kthreadd
>>
>> I do not have any particular ovs configuration on this box:
>> sharpd@janelle:~$ sudo ovs-vsctl show
>> c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>> ovs_version: "2.13.8"
>>
>>
>> sharpd@janelle:~$ sudo ovs-vsctl list o .
>> _uuid : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>> bridges : []
>> cur_cfg : 0
>> datapath_types : [netdev, system]
>> datapaths : {}
>> db_version : "8.2.0"
>> dpdk_initialized : false
>> dpdk_version : none
>> external_ids : {hostname=janelle, rundir="/var/run/openvswitch",
>system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
>> iface_types : [erspan, geneve, gre, internal, ip6erspan, ip6gre,
>> lisp,
>patch, stt, system, tap, vxlan]
>> manager_options : []
>> next_cfg : 0
>> other_config : {}
>> ovs_version : "2.13.8"
>> ssl : []
>> statistics : {}
>> system_type : ubuntu
>> system_version : "20.04"
>>
>> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
>> ovs-vswitchd: no datapaths exist
>> ovs-vswitchd: datapath not found (Invalid argument)
>> ovs-appctl: ovs-vswitchd: server returned an error
>>
>> Eli Britstein suggested I update ovs-openvswitch to latest and I did
>> and saw the same behavior. When I pulled up the running code in a
>debugger I see that ovs-vswitchd is running in this loop below pretty much
>100% of the time:
>>
>> (gdb) f 4
>> #4 0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
>> 133 nln_run(nln);
>> (gdb) l
>> 128 OVS_EXCLUDED(route_table_mutex)
>> 129 {
>> 130 ovs_mutex_lock(&route_table_mutex);
>> 131 if (nln) {
>> 132 rtnetlink_run();
>> 133 nln_run(nln);
>> 134
>> 135 if (!route_table_valid) {
>> 136 route_table_reset();
>> 137 }
>> (gdb) l
>> 138 }
>> 139 ovs_mutex_unlock(&route_table_mutex);
>> 140 }
>>
>> I pulled up where route_table_valid is set:
>>
>> 298 static void
>> 299 route_table_change(const struct route_table_msg *change
>OVS_UNUSED,
>> 300 void *aux OVS_UNUSED)
>> 301 {
>> 302 route_table_valid = false;
>> 303 }
>>
>>
>> If I am reading the code correctly, every RTM_NEWROUTE netlink message
>> that ovs-vswitchd is getting is setting the route_table_valid global
>> variable to
>false and causing route_table_reset() to be run.
>> This makes sense in context of what FRR is doing. A full BGP feed
>> *always* has churn. So ovs-vswitchd is receiving. RTM_NEWROUTE
>> message, parsing it and deciding in route_table_change() that the
>> route table is no longer valid and causing it to call route_table_reset()
>> which
>redumps the entire routing table to ovs-vswitchd. In this case there are ~115k
>ipv6 routes in the linux fib.
>>
>> I hesitate to make any changes here since I really don't understand what the
>end goal here is.
>> ovs-vswitchd is receiving a route change from the kernel but is in
>> turn causing it to redump the entire routing table again. What should be the
>correct behavior be from ovs-vswitchd's perspective here?
>
>Hi, Donald.
>
>Your analysis is correct. OVS will invalidate the cached routing table and re-
>dump it in full on the next access on each netlink notification about route
>changes.
>
>Looking back into commit history, OVS did maintain the cache and only
>added/removed what was in the netlink message incrementally.
>But that changed in 2011 with the following commit:
>
>commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884
>Author: Ethan J. Jackson <e...@eecs.berkeley.edu>
>Date: Thu Jan 13 16:29:31 2011 -0800
>
> route-table: Handle route updates more robustly.
>
> The kernel does not broadcast rtnetlink route messages in all cases
> one would expect. This can cause stale entires to end up in the
> route table which may cause incorrect results for
> route_table_get_ifindex() queries. This commit causes rtnetlink
> route messages to dump the entire route table on the next
> route_table_get_ifindex() query.
>
>And indeed, looking at the history of attempts of different projects to use
>route notifications, they all are facing issues and it seems like none of them
>is
>actually able to fully correctly handle all the notifications, just because
>these
>notifications are notoriously bad.
>It seems to be impossible in certain cases to tell what exactly changed and
>how. There could be duplicates or missing notifications.
>And the code of projects that are trying to maintain a route cache in userspace
>is insanely complex and doesn't handle 100% of cases anyway.
>
>There were attempts to convince kernel developers to add unique identifiers
>to routes, so userspace can tell them apart, but all of them seems to die
>leaving the problem unresolved.
>
>These are some discussions/bugs that I found:
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzil
>la.redhat.com%2Fshow_bug.cgi%3Fid%3D1337855&data=05%7C01%7Celi
>br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LWOW4uNIhpSbEtBBVlhyy0
>TiPyKXYxXv%2B%2Fwppp5bMpM%3D&reserved=0
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzil
>la.redhat.com%2Fshow_bug.cgi%3Fid%3D1722728&data=05%7C01%7Celi
>br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vOfVjOADZpRIt1mEIj9ygrkD
>UE2k4paCTiAB51Nj97w%3D&reserved=0
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
>b.com%2Fthom311%2Flibnl%2Fissues%2F226&data=05%7C01%7Celibr%4
>0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BCO0Ns6HTfiqjHYb3M6rHTh
>W7d01OtMAkcAqWDnQwVE%3D&reserved=0
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
>b.com%2Fthom311%2Flibnl%2Fissues%2F224&data=05%7C01%7Celibr%4
>0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>6Mn0%3D%7C3000%7C%7C%7C&sdata=p9rviRFrZjayuCmcfn4jij8lRWwTb
>0Jsy6eeN5UfUJ0%3D&reserved=0
>
>None of the bugs seems to be resolved. Most are closed for non-technical
>reasons.
>
>I suppose, Ethan just decided to not deal with that horribly unreliable kernel
>interface and just re-dump the route table on changes.
>
>
>For your actual problem here, I'm not sure if we can fix it that easily.
>
>Is it necessary for OVS to know about these routes?
>If no, it might be possible to isolate them in a separate network namespace,
>so OVS will not receive all the route updates?
>
>Do you know how long it takes to dump a route table once?
>Maybe it worth limiting that process to only dump once a second or once in a
>few seconds. That should alleviate the load if the actual dump is relatively
>fast.
In this setup OVS just runs without any use. There is no datapath (no
bridges/ports) configured. It is useless to run this mechanism at all for it.
We can bind this mechanism to at least one datapath is configured (or even only
when there is at least one tunnel configured).
What do you think?
>
>Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss