It's less about a full bgp feed and any route churn that causes
ovs-vswitchd to cycle into a loop of re-requesting data.  Even having 100's
or 1000's of routes with significant churn ( say a link flapping over on
the other side of the network ) will cause system load.

donald

On Tue, Nov 1, 2022 at 7:56 AM Roberto Bartzen Acosta <
roberto.aco...@luizalabs.com> wrote:

> Hey folks,
>
> Thanks for bringing up this discussion. I'm interested in using FRR+BGP
> for the DVR scenario (ovn+ovs), and I understand that tracking the route
> events is a very hard work, but I don't see the need for a node running ovs
> to work with BGP "full routing", do you see any scenario with this BGP full
> requirement?
>
> Best regards,
> Roberto
>
> Em ter., 1 de nov. de 2022 às 07:39, Eli Britstein via discuss <
> ovs-discuss@openvswitch.org> escreveu:
>
>>
>>
>> >-----Original Message-----
>> >From: Ilya Maximets <i.maxim...@ovn.org>
>> >Sent: Tuesday, 1 November 2022 12:23
>> >To: Eli Britstein <el...@nvidia.com>; Donald Sharp
>> ><donaldshar...@gmail.com>; ovs-discuss@openvswitch.org;
>> >e...@eecs.berkeley.edu
>> >Cc: i.maxim...@ovn.org
>> >Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
>> >
>> >External email: Use caution opening links or attachments
>> >
>> >
>> >On 11/1/22 10:50, Eli Britstein wrote:
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: Ilya Maximets <i.maxim...@ovn.org>
>> >>> Sent: Monday, 31 October 2022 23:54
>> >>> To: Donald Sharp <donaldshar...@gmail.com>; ovs-
>> >>> disc...@openvswitch.org; e...@eecs.berkeley.edu; Eli Britstein
>> >>> <el...@nvidia.com>
>> >>> Cc: i.maxim...@ovn.org
>> >>> Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
>> >>>
>> >>> External email: Use caution opening links or attachments
>> >>>
>> >>>
>> >>> On 10/31/22 17:25, Donald Sharp via discuss wrote:
>> >>>> Hi!
>> >>>>
>> >>>> I work on the FRRouting project (https://frrouting/org
>> >>> <https://frrouting/org> ) and am doing work with FRR and have noticed
>> >>> that when I have a full BGP feed on a system that is also running
>> >>> ovs-vswitchd that ovs-vswitchd sits at 100% cpu:
>> >>>>
>> >>>> top - 09:43:12 up 4 days, 22:53,  3 users,  load average: 1.06,
>> 1.08, 1.08
>> >>>> Tasks: 188 total,   3 running, 185 sleeping,   0 stopped,   0 zombie
>> >>>> %Cpu(s): 12.3 us, 14.7 sy,  0.0 ni, 72.8 id,  0.0 wa,  0.0 hi,  0.2
>> si,  0.0 st
>> >>>> MiB Mem :   7859.3 total,   2756.5 free,   2467.2 used,   2635.6
>> buff/cache
>> >>>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   5101.9
>> avail Mem
>> >>>>
>> >>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM
>>  TIME+
>> >COMMAND
>> >>>>     730 root      10 -10  146204 146048  11636 R  98.3   1.8
>>  6998:13 ovs-
>> >vswitchd
>> >>>>  169620 root      20   0       0      0      0 I   3.3   0.0
>>  1:34.83 kworker/0:3-events
>> >>>>      21 root      20   0       0      0      0 S   1.3   0.0
>> 14:09.59 ksoftirqd/1
>> >>>>  131734 frr       15  -5 2384292 609556   6612 S   1.0   7.6
>> 21:57.51 zebra
>> >>>>  131739 frr       15  -5 1301168   1.0g   7420 S   1.0  13.3
>> 18:16.17 bgpd
>> >>>>
>> >>>> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops
>> >>>> running
>> >>> at 100%:
>> >>>>
>> >>>> top - 09:48:12 up 4 days, 22:58,  3 users,  load average: 0.08,
>> 0.60, 0.89
>> >>>> Tasks: 169 total,   1 running, 168 sleeping,   0 stopped,   0 zombie
>> >>>> %Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.1
>> si,  0.0 st
>> >>>> MiB Mem :   7859.3 total,   4560.6 free,    663.1 used,   2635.6
>> buff/cache
>> >>>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   6906.1
>> avail Mem
>> >>>>
>> >>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM
>>  TIME+
>> >COMMAND
>> >>>>  179064 sharpd    20   0   11852   3816   3172 R   1.0   0.0
>>  0:00.09 top
>> >>>>    1037 zerotie+  20   0  291852 113180   7408 S   0.7   1.4
>> 19:09.17 zerotier-
>> >one
>> >>>>    1043 Debian-+  20   0   34356  21988   7588 S   0.3   0.3
>> 22:04.42 snmpd
>> >>>>  178480 root      20   0       0      0      0 I   0.3   0.0
>>  0:01.21 kworker/1:2-events
>> >>>>  178622 sharpd    20   0   14020   6364   4872 S   0.3   0.1
>>  0:00.10 sshd
>> >>>>       1 root      20   0  169872  13140   8272 S   0.0   0.2
>>  2:33.26 systemd
>> >>>>       2 root      20   0       0      0      0 S   0.0   0.0
>>  0:00.60 kthreadd
>> >>>>
>> >>>> I do not have any particular ovs configuration on this box:
>> >>>> sharpd@janelle:~$ sudo ovs-vsctl show
>> >>>> c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>> >>>>     ovs_version: "2.13.8"
>> >>>>
>> >>>>
>> >>>> sharpd@janelle:~$ sudo ovs-vsctl list o .
>> >>>> _uuid               : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>> >>>> bridges             : []
>> >>>> cur_cfg             : 0
>> >>>> datapath_types      : [netdev, system]
>> >>>> datapaths           : {}
>> >>>> db_version          : "8.2.0"
>> >>>> dpdk_initialized    : false
>> >>>> dpdk_version        : none
>> >>>> external_ids        : {hostname=janelle,
>> rundir="/var/run/openvswitch",
>> >>> system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
>> >>>> iface_types         : [erspan, geneve, gre, internal, ip6erspan,
>> ip6gre, lisp,
>> >>> patch, stt, system, tap, vxlan]
>> >>>> manager_options     : []
>> >>>> next_cfg            : 0
>> >>>> other_config        : {}
>> >>>> ovs_version         : "2.13.8"
>> >>>> ssl                 : []
>> >>>> statistics          : {}
>> >>>> system_type         : ubuntu
>> >>>> system_version      : "20.04"
>> >>>>
>> >>>> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
>> >>>> ovs-vswitchd: no datapaths exist
>> >>>> ovs-vswitchd: datapath not found (Invalid argument)
>> >>>> ovs-appctl: ovs-vswitchd: server returned an error
>> >>>>
>> >>>> Eli Britstein suggested I update ovs-openvswitch to latest and I did
>> >>>> and saw the same behavior.  When I pulled up the running code in a
>> >>> debugger I see that ovs-vswitchd is running in this loop below pretty
>> >>> much 100% of the time:
>> >>>>
>> >>>> (gdb) f 4
>> >>>> #4  0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
>> >>>> 133                 nln_run(nln);
>> >>>> (gdb) l
>> >>>> 128             OVS_EXCLUDED(route_table_mutex)
>> >>>> 129         {
>> >>>> 130             ovs_mutex_lock(&route_table_mutex);
>> >>>> 131             if (nln) {
>> >>>> 132                 rtnetlink_run();
>> >>>> 133                 nln_run(nln);
>> >>>> 134
>> >>>> 135                 if (!route_table_valid) {
>> >>>> 136                     route_table_reset();
>> >>>> 137                 }
>> >>>> (gdb) l
>> >>>> 138             }
>> >>>> 139             ovs_mutex_unlock(&route_table_mutex);
>> >>>> 140         }
>> >>>>
>> >>>> I pulled up where route_table_valid is set:
>> >>>>
>> >>>> 298         static void
>> >>>> 299         route_table_change(const struct route_table_msg *change
>> >>> OVS_UNUSED,
>> >>>> 300                            void *aux OVS_UNUSED)
>> >>>> 301         {
>> >>>> 302             route_table_valid = false;
>> >>>> 303         }
>> >>>>
>> >>>>
>> >>>> If I am reading the code correctly, every RTM_NEWROUTE netlink
>> >>>> message that ovs-vswitchd is getting is setting the
>> >>>> route_table_valid global variable to
>> >>> false and causing route_table_reset() to be run.
>> >>>> This makes sense in context of what FRR is doing.  A full BGP feed
>> >>>> *always* has churn.  So ovs-vswitchd is receiving. RTM_NEWROUTE
>> >>>> message, parsing it and deciding in route_table_change() that the
>> >>>> route table is no longer valid and causing it to call
>> >>>> route_table_reset() which
>> >>> redumps the entire routing table to ovs-vswitchd.  In this case there
>> >>> are ~115k
>> >>> ipv6 routes in the linux fib.
>> >>>>
>> >>>> I hesitate to make any changes here since I really don't understand
>> >>>> what the
>> >>> end goal here is.
>> >>>> ovs-vswitchd is receiving a route change from the kernel but is in
>> >>>> turn causing it to redump the entire routing table again.  What
>> >>>> should be the
>> >>> correct behavior be from ovs-vswitchd's perspective here?
>> >>>
>> >>> Hi, Donald.
>> >>>
>> >>> Your analysis is correct.  OVS will invalidate the cached routing
>> >>> table and re- dump it in full on the next access on each netlink
>> >>> notification about route changes.
>> >>>
>> >>> Looking back into commit history, OVS did maintain the cache and only
>> >>> added/removed what was in the netlink message incrementally.
>> >>> But that changed in 2011 with the following commit:
>> >>>
>> >>> commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884
>> >>> Author: Ethan J. Jackson <e...@eecs.berkeley.edu>
>> >>> Date:   Thu Jan 13 16:29:31 2011 -0800
>> >>>
>> >>>    route-table: Handle route updates more robustly.
>> >>>
>> >>>    The kernel does not broadcast rtnetlink route messages in all cases
>> >>>    one would expect.  This can cause stale entires to end up in the
>> >>>    route table which may cause incorrect results for
>> >>>    route_table_get_ifindex() queries.  This commit causes rtnetlink
>> >>>    route messages to dump the entire route table on the next
>> >>>    route_table_get_ifindex() query.
>> >>>
>> >>> And indeed, looking at the history of attempts of different projects
>> >>> to use route notifications, they all are facing issues and it seems
>> >>> like none of them is actually able to fully correctly handle all the
>> >>> notifications, just because these notifications are notoriously bad.
>> >>> It seems to be impossible in certain cases to tell what exactly
>> >>> changed and how.  There could be duplicates or missing notifications.
>> >>> And the code of projects that are trying to maintain a route cache in
>> >>> userspace is insanely complex and doesn't handle 100% of cases anyway.
>> >>>
>> >>> There were attempts to convince kernel developers to add unique
>> >>> identifiers to routes, so userspace can tell them apart, but all of
>> >>> them seems to die leaving the problem unresolved.
>> >>>
>> >>> These are some discussions/bugs that I found:
>> >>>
>> >>>
>> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
>> >>> zil
>> >>>
>> >la.redhat.com%2Fshow_bug.cgi%3Fid%3D1337855&amp;data=05%7C01%7Celi
>> >>>
>> >br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>> >>>
>> >40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>> >>>
>> >WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>> >>>
>> >XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LWOW4uNIhpSbEtBBVlhyy0
>> >>> TiPyKXYxXv%2B%2Fwppp5bMpM%3D&amp;reserved=0
>> >>>
>> >>>
>> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
>> >>> zil
>> >>>
>> >la.redhat.com%2Fshow_bug.cgi%3Fid%3D1722728&amp;data=05%7C01%7Celi
>> >>>
>> >br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>> >>>
>> >40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>> >>>
>> >WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>> >>>
>> >XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=vOfVjOADZpRIt1mEIj9ygrkD
>> >>> UE2k4paCTiAB51Nj97w%3D&amp;reserved=0
>> >>>
>> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>> >>> hu
>> >>>
>> >b.com%2Fthom311%2Flibnl%2Fissues%2F226&amp;data=05%7C01%7Celibr%4
>> >>>
>> >0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>> >>>
>> >7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>> >>>
>> >bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>> >>>
>> >6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2BCO0Ns6HTfiqjHYb3M6rHTh
>> >>> W7d01OtMAkcAqWDnQwVE%3D&amp;reserved=0
>> >>>
>> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>> >>> hu
>> >>>
>> >b.com%2Fthom311%2Flibnl%2Fissues%2F224&amp;data=05%7C01%7Celibr%4
>> >>>
>> >0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>> >>>
>> >7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>> >>>
>> >bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>> >>>
>> >6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=p9rviRFrZjayuCmcfn4jij8lRWwTb
>> >>> 0Jsy6eeN5UfUJ0%3D&amp;reserved=0
>> >>>
>> >>> None of the bugs seems to be resolved.  Most are closed for
>> >>> non-technical reasons.
>> >>>
>> >>> I suppose, Ethan just decided to not deal with that horribly
>> >>> unreliable kernel interface and just re-dump the route table on
>> changes.
>> >>>
>> >>>
>> >>> For your actual problem here, I'm not sure if we can fix it that
>> easily.
>> >>>
>> >>> Is it necessary for OVS to know about these routes?
>> >>> If no, it might be possible to isolate them in a separate network
>> >>> namespace, so OVS will not receive all the route updates?
>> >>>
>> >>> Do you know how long it takes to dump a route table once?
>> >>> Maybe it worth limiting that process to only dump once a second or
>> >>> once in a few seconds.  That should alleviate the load if the actual
>> >>> dump is relatively fast.
>> >> In this setup OVS just runs without any use. There is no datapath (no
>> >bridges/ports) configured. It is useless to run this mechanism at all
>> for it.
>> >> We can bind this mechanism to at least one datapath is configured (or
>> even
>> >only when there is at least one tunnel configured).
>> >> What do you think?
>> >
>> >Hmm.  Why don't you just stop/disable the service then?
>> Indeed, that's possible. It's just turned on by default in this system
>> (Debian) and Donald noticed the CPU consumption.
>> >
>> >>>
>> >>> Best regards, Ilya Maximets.
>>
>> _______________________________________________
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
>
>
> *‘Esta mensagem é direcionada apenas para os endereços constantes no
> cabeçalho inicial. Se você não está listado nos endereços constantes no
> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão
> imediatamente anuladas e proibidas’.*
>
>  *‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para
> assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não
> poderá aceitar a responsabilidade por quaisquer perdas ou danos causados
> por esse e-mail ou por seus anexos’.*
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to