Re: [ovs-discuss] ovs-vswitchd running at 100% cpu

Eli Britstein via discuss Tue, 01 Nov 2022 02:50:23 -0700

>-----Original Message-----
>From: Ilya Maximets <i.maxim...@ovn.org>
>Sent: Monday, 31 October 2022 23:54
>To: Donald Sharp <donaldshar...@gmail.com>; ovs-
>disc...@openvswitch.org; e...@eecs.berkeley.edu; Eli Britstein
><el...@nvidia.com>
>Cc: i.maxim...@ovn.org
>Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
>
>External email: Use caution opening links or attachments
>
>
>On 10/31/22 17:25, Donald Sharp via discuss wrote:
>> Hi!
>>
>> I work on the FRRouting project (https://frrouting/org
><https://frrouting/org> ) and am doing work with FRR and have noticed that
>when I have a full BGP feed on a system that is also running ovs-vswitchd that
>ovs-vswitchd sits at 100% cpu:
>>
>> top - 09:43:12 up 4 days, 22:53,  3 users,  load average: 1.06, 1.08, 1.08
>> Tasks: 188 total,   3 running, 185 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 12.3 us, 14.7 sy,  0.0 ni, 72.8 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 
>> st
>> MiB Mem :   7859.3 total,   2756.5 free,   2467.2 used,   2635.6 buff/cache
>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   5101.9 avail Mem
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
>> COMMAND
>>     730 root      10 -10  146204 146048  11636 R  98.3   1.8   6998:13 
>> ovs-vswitchd
>>  169620 root      20   0       0      0      0 I   3.3   0.0   1:34.83 
>> kworker/0:3-events
>>      21 root      20   0       0      0      0 S   1.3   0.0  14:09.59 
>> ksoftirqd/1
>>  131734 frr       15  -5 2384292 609556   6612 S   1.0   7.6  21:57.51 zebra
>>  131739 frr       15  -5 1301168   1.0g   7420 S   1.0  13.3  18:16.17 bgpd
>>
>> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops running
>at 100%:
>>
>> top - 09:48:12 up 4 days, 22:58,  3 users,  load average: 0.08, 0.60, 0.89
>> Tasks: 169 total,   1 running, 168 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 
>> st
>> MiB Mem :   7859.3 total,   4560.6 free,    663.1 used,   2635.6 buff/cache
>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   6906.1 avail Mem
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
>> COMMAND
>>  179064 sharpd    20   0   11852   3816   3172 R   1.0   0.0   0:00.09 top
>>    1037 zerotie+  20   0  291852 113180   7408 S   0.7   1.4  19:09.17 
>> zerotier-one
>>    1043 Debian-+  20   0   34356  21988   7588 S   0.3   0.3  22:04.42 snmpd
>>  178480 root      20   0       0      0      0 I   0.3   0.0   0:01.21 
>> kworker/1:2-events
>>  178622 sharpd    20   0   14020   6364   4872 S   0.3   0.1   0:00.10 sshd
>>       1 root      20   0  169872  13140   8272 S   0.0   0.2   2:33.26 
>> systemd
>>       2 root      20   0       0      0      0 S   0.0   0.0   0:00.60 
>> kthreadd
>>
>> I do not have any particular ovs configuration on this box:
>> sharpd@janelle:~$ sudo ovs-vsctl show
>> c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>>     ovs_version: "2.13.8"
>>
>>
>> sharpd@janelle:~$ sudo ovs-vsctl list o .
>> _uuid               : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>> bridges             : []
>> cur_cfg             : 0
>> datapath_types      : [netdev, system]
>> datapaths           : {}
>> db_version          : "8.2.0"
>> dpdk_initialized    : false
>> dpdk_version        : none
>> external_ids        : {hostname=janelle, rundir="/var/run/openvswitch",
>system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
>> iface_types         : [erspan, geneve, gre, internal, ip6erspan, ip6gre, 
>> lisp,
>patch, stt, system, tap, vxlan]
>> manager_options     : []
>> next_cfg            : 0
>> other_config        : {}
>> ovs_version         : "2.13.8"
>> ssl                 : []
>> statistics          : {}
>> system_type         : ubuntu
>> system_version      : "20.04"
>>
>> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
>> ovs-vswitchd: no datapaths exist
>> ovs-vswitchd: datapath not found (Invalid argument)
>> ovs-appctl: ovs-vswitchd: server returned an error
>>
>> Eli Britstein suggested I update ovs-openvswitch to latest and I did
>> and saw the same behavior.  When I pulled up the running code in a
>debugger I see that ovs-vswitchd is running in this loop below pretty much
>100% of the time:
>>
>> (gdb) f 4
>> #4  0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
>> 133                 nln_run(nln);
>> (gdb) l
>> 128             OVS_EXCLUDED(route_table_mutex)
>> 129         {
>> 130             ovs_mutex_lock(&route_table_mutex);
>> 131             if (nln) {
>> 132                 rtnetlink_run();
>> 133                 nln_run(nln);
>> 134
>> 135                 if (!route_table_valid) {
>> 136                     route_table_reset();
>> 137                 }
>> (gdb) l
>> 138             }
>> 139             ovs_mutex_unlock(&route_table_mutex);
>> 140         }
>>
>> I pulled up where route_table_valid is set:
>>
>> 298         static void
>> 299         route_table_change(const struct route_table_msg *change
>OVS_UNUSED,
>> 300                            void *aux OVS_UNUSED)
>> 301         {
>> 302             route_table_valid = false;
>> 303         }
>>
>>
>> If I am reading the code correctly, every RTM_NEWROUTE netlink message
>> that ovs-vswitchd is getting is setting the route_table_valid global 
>> variable to
>false and causing route_table_reset() to be run.
>> This makes sense in context of what FRR is doing.  A full BGP feed
>> *always* has churn.  So ovs-vswitchd is receiving. RTM_NEWROUTE
>> message, parsing it and deciding in route_table_change() that the
>> route table is no longer valid and causing it to call route_table_reset() 
>> which
>redumps the entire routing table to ovs-vswitchd.  In this case there are ~115k
>ipv6 routes in the linux fib.
>>
>> I hesitate to make any changes here since I really don't understand what the
>end goal here is.
>> ovs-vswitchd is receiving a route change from the kernel but is in
>> turn causing it to redump the entire routing table again.  What should be the
>correct behavior be from ovs-vswitchd's perspective here?
>
>Hi, Donald.
>
>Your analysis is correct.  OVS will invalidate the cached routing table and re-
>dump it in full on the next access on each netlink notification about route
>changes.
>
>Looking back into commit history, OVS did maintain the cache and only
>added/removed what was in the netlink message incrementally.
>But that changed in 2011 with the following commit:
>
>commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884
>Author: Ethan J. Jackson <e...@eecs.berkeley.edu>
>Date:   Thu Jan 13 16:29:31 2011 -0800
>
>    route-table: Handle route updates more robustly.
>
>    The kernel does not broadcast rtnetlink route messages in all cases
>    one would expect.  This can cause stale entires to end up in the
>    route table which may cause incorrect results for
>    route_table_get_ifindex() queries.  This commit causes rtnetlink
>    route messages to dump the entire route table on the next
>    route_table_get_ifindex() query.
>
>And indeed, looking at the history of attempts of different projects to use
>route notifications, they all are facing issues and it seems like none of them 
>is
>actually able to fully correctly handle all the notifications, just because 
>these
>notifications are notoriously bad.
>It seems to be impossible in certain cases to tell what exactly changed and
>how.  There could be duplicates or missing notifications.
>And the code of projects that are trying to maintain a route cache in userspace
>is insanely complex and doesn't handle 100% of cases anyway.
>
>There were attempts to convince kernel developers to add unique identifiers
>to routes, so userspace can tell them apart, but all of them seems to die
>leaving the problem unresolved.
>
>These are some discussions/bugs that I found:
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzil
>la.redhat.com%2Fshow_bug.cgi%3Fid%3D1337855&amp;data=05%7C01%7Celi
>br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LWOW4uNIhpSbEtBBVlhyy0
>TiPyKXYxXv%2B%2Fwppp5bMpM%3D&amp;reserved=0
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzil
>la.redhat.com%2Fshow_bug.cgi%3Fid%3D1722728&amp;data=05%7C01%7Celi
>br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=vOfVjOADZpRIt1mEIj9ygrkD
>UE2k4paCTiAB51Nj97w%3D&amp;reserved=0
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
>b.com%2Fthom311%2Flibnl%2Fissues%2F226&amp;data=05%7C01%7Celibr%4
>0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2BCO0Ns6HTfiqjHYb3M6rHTh
>W7d01OtMAkcAqWDnQwVE%3D&amp;reserved=0
>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
>b.com%2Fthom311%2Flibnl%2Fissues%2F224&amp;data=05%7C01%7Celibr%4
>0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=p9rviRFrZjayuCmcfn4jij8lRWwTb
>0Jsy6eeN5UfUJ0%3D&amp;reserved=0
>
>None of the bugs seems to be resolved.  Most are closed for non-technical
>reasons.
>
>I suppose, Ethan just decided to not deal with that horribly unreliable kernel
>interface and just re-dump the route table on changes.
>
>
>For your actual problem here, I'm not sure if we can fix it that easily.
>
>Is it necessary for OVS to know about these routes?
>If no, it might be possible to isolate them in a separate network namespace,
>so OVS will not receive all the route updates?
>
>Do you know how long it takes to dump a route table once?
>Maybe it worth limiting that process to only dump once a second or once in a
>few seconds.  That should alleviate the load if the actual dump is relatively
>fast.
In this setup OVS just runs without any use. There is no datapath (no 
bridges/ports) configured. It is useless to run this mechanism at all for it.
We can bind this mechanism to at least one datapath is configured (or even only 
when there is at least one tunnel configured).
What do you think?
>
>Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] ovs-vswitchd running at 100% cpu

Reply via email to