On 10/31/22 17:25, Donald Sharp via discuss wrote:
> Hi!
> 
> I work on the FRRouting project (https://frrouting/org 
> <https://frrouting/org> ) and am doing work with FRR and have noticed that 
> when I have a full BGP feed on a system that is also running ovs-vswitchd 
> that ovs-vswitchd sits at 100% cpu:
> 
> top - 09:43:12 up 4 days, 22:53,  3 users,  load average: 1.06, 1.08, 1.08
> Tasks: 188 total,   3 running, 185 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 12.3 us, 14.7 sy,  0.0 ni, 72.8 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 
> st
> MiB Mem :   7859.3 total,   2756.5 free,   2467.2 used,   2635.6 buff/cache
> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   5101.9 avail Mem 
>  
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
> COMMAND                                                                       
>             
>     730 root      10 -10  146204 146048  11636 R  98.3   1.8   6998:13 
> ovs-vswitchd                                                                  
>             
>  169620 root      20   0       0      0      0 I   3.3   0.0   1:34.83 
> kworker/0:3-events                                                            
>             
>      21 root      20   0       0      0      0 S   1.3   0.0  14:09.59 
> ksoftirqd/1                                                                   
>             
>  131734 frr       15  -5 2384292 609556   6612 S   1.0   7.6  21:57.51 zebra  
>                                                                               
>      
>  131739 frr       15  -5 1301168   1.0g   7420 S   1.0  13.3  18:16.17 bgpd   
>                
> 
> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops running 
> at 100%:
> 
> top - 09:48:12 up 4 days, 22:58,  3 users,  load average: 0.08, 0.60, 0.89
> Tasks: 169 total,   1 running, 168 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 
> st
> MiB Mem :   7859.3 total,   4560.6 free,    663.1 used,   2635.6 buff/cache
> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   6906.1 avail Mem 
>  
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
> COMMAND                                                                       
>       
>  179064 sharpd    20   0   11852   3816   3172 R   1.0   0.0   0:00.09 top    
>                                                                               
>      
>    1037 zerotie+  20   0  291852 113180   7408 S   0.7   1.4  19:09.17 
> zerotier-one                                                                  
>             
>    1043 Debian-+  20   0   34356  21988   7588 S   0.3   0.3  22:04.42 snmpd  
>                                                                               
>      
>  178480 root      20   0       0      0      0 I   0.3   0.0   0:01.21 
> kworker/1:2-events                                                            
>             
>  178622 sharpd    20   0   14020   6364   4872 S   0.3   0.1   0:00.10 sshd   
>                                                                               
>      
>       1 root      20   0  169872  13140   8272 S   0.0   0.2   2:33.26 
> systemd                                                                       
>             
>       2 root      20   0       0      0      0 S   0.0   0.0   0:00.60 
> kthreadd         
> 
> I do not have any particular ovs configuration on this box:
> sharpd@janelle:~$ sudo ovs-vsctl show
> c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>     ovs_version: "2.13.8"
> 
> 
> sharpd@janelle:~$ sudo ovs-vsctl list o .
> _uuid               : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
> bridges             : []
> cur_cfg             : 0
> datapath_types      : [netdev, system]
> datapaths           : {}
> db_version          : "8.2.0"
> dpdk_initialized    : false
> dpdk_version        : none
> external_ids        : {hostname=janelle, rundir="/var/run/openvswitch", 
> system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
> iface_types         : [erspan, geneve, gre, internal, ip6erspan, ip6gre, 
> lisp, patch, stt, system, tap, vxlan]
> manager_options     : []
> next_cfg            : 0
> other_config        : {}
> ovs_version         : "2.13.8"
> ssl                 : []
> statistics          : {}
> system_type         : ubuntu
> system_version      : "20.04"
> 
> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
> ovs-vswitchd: no datapaths exist
> ovs-vswitchd: datapath not found (Invalid argument)
> ovs-appctl: ovs-vswitchd: server returned an error
> 
> Eli Britstein suggested I update ovs-openvswitch to latest and I did and saw 
> the same behavior.  When I pulled up the running code in a debugger I see
> that ovs-vswitchd is running in this loop below pretty much 100% of the time:
> 
> (gdb) f 4
> #4  0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
> 133                 nln_run(nln);
> (gdb) l
> 128             OVS_EXCLUDED(route_table_mutex)
> 129         {
> 130             ovs_mutex_lock(&route_table_mutex);
> 131             if (nln) {
> 132                 rtnetlink_run();
> 133                 nln_run(nln);
> 134         
> 135                 if (!route_table_valid) {
> 136                     route_table_reset();
> 137                 }
> (gdb) l
> 138             }
> 139             ovs_mutex_unlock(&route_table_mutex);
> 140         }
> 
> I pulled up where route_table_valid is set:
> 
> 298         static void
> 299         route_table_change(const struct route_table_msg *change 
> OVS_UNUSED,
> 300                            void *aux OVS_UNUSED)
> 301         {
> 302             route_table_valid = false;
> 303         }
> 
> 
> If I am reading the code correctly, every RTM_NEWROUTE netlink message that 
> ovs-vswitchd is getting
> is setting the route_table_valid global variable to false and causing 
> route_table_reset() to be run.
> This makes sense in context of what FRR is doing.  A full BGP feed *always* 
> has churn.  So ovs-vswitchd
> is receiving. RTM_NEWROUTE message, parsing it and deciding in 
> route_table_change() that the
> route table is no longer valid and causing it to call route_table_reset() 
> which redumps the entire
> routing table to ovs-vswitchd.  In this case there are ~115k ipv6 routes in 
> the linux fib.  
> 
> I hesitate to make any changes here since I really don't understand what the 
> end goal here is.
> ovs-vswitchd is receiving a route change from the kernel but is in turn 
> causing it to redump the entire
> routing table again.  What should be the correct behavior be from 
> ovs-vswitchd's perspective here?

Hi, Donald.

Your analysis is correct.  OVS will invalidate the cached routing
table and re-dump it in full on the next access on each netlink
notification about route changes.

Looking back into commit history, OVS did maintain the cache and
only added/removed what was in the netlink message incrementally.
But that changed in 2011 with the following commit:

commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884
Author: Ethan J. Jackson <e...@eecs.berkeley.edu>
Date:   Thu Jan 13 16:29:31 2011 -0800

    route-table: Handle route updates more robustly.
    
    The kernel does not broadcast rtnetlink route messages in all cases
    one would expect.  This can cause stale entires to end up in the
    route table which may cause incorrect results for
    route_table_get_ifindex() queries.  This commit causes rtnetlink
    route messages to dump the entire route table on the next
    route_table_get_ifindex() query.

And indeed, looking at the history of attempts of different projects
to use route notifications, they all are facing issues and it seems
like none of them is actually able to fully correctly handle all the
notifications, just because these notifications are notoriously bad.
It seems to be impossible in certain cases to tell what exactly
changed and how.  There could be duplicates or missing notifications.
And the code of projects that are trying to maintain a route cache
in userspace is insanely complex and doesn't handle 100% of cases
anyway.

There were attempts to convince kernel developers to add unique
identifiers to routes, so userspace can tell them apart, but all
of them seems to die leaving the problem unresolved.

These are some discussions/bugs that I found:
  https://bugzilla.redhat.com/show_bug.cgi?id=1337855
  https://bugzilla.redhat.com/show_bug.cgi?id=1722728
  https://github.com/thom311/libnl/issues/226
  https://github.com/thom311/libnl/issues/224

None of the bugs seems to be resolved.  Most are closed for
non-technical reasons.

I suppose, Ethan just decided to not deal with that horribly
unreliable kernel interface and just re-dump the route table on
changes.


For your actual problem here, I'm not sure if we can fix it
that easily.

Is it necessary for OVS to know about these routes?
If no, it might be possible to isolate them in a separate network
namespace, so OVS will not receive all the route updates?

Do you know how long it takes to dump a route table once?
Maybe it worth limiting that process to only dump once a second
or once in a few seconds.  That should alleviate the load if the
actual dump is relatively fast.

Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to