It's less about a full bgp feed and any route churn that causes ovs-vswitchd to cycle into a loop of re-requesting data. Even having 100's or 1000's of routes with significant churn ( say a link flapping over on the other side of the network ) will cause system load.
donald On Tue, Nov 1, 2022 at 7:56 AM Roberto Bartzen Acosta < roberto.aco...@luizalabs.com> wrote: > Hey folks, > > Thanks for bringing up this discussion. I'm interested in using FRR+BGP > for the DVR scenario (ovn+ovs), and I understand that tracking the route > events is a very hard work, but I don't see the need for a node running ovs > to work with BGP "full routing", do you see any scenario with this BGP full > requirement? > > Best regards, > Roberto > > Em ter., 1 de nov. de 2022 às 07:39, Eli Britstein via discuss < > ovs-discuss@openvswitch.org> escreveu: > >> >> >> >-----Original Message----- >> >From: Ilya Maximets <i.maxim...@ovn.org> >> >Sent: Tuesday, 1 November 2022 12:23 >> >To: Eli Britstein <el...@nvidia.com>; Donald Sharp >> ><donaldshar...@gmail.com>; ovs-discuss@openvswitch.org; >> >e...@eecs.berkeley.edu >> >Cc: i.maxim...@ovn.org >> >Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu >> > >> >External email: Use caution opening links or attachments >> > >> > >> >On 11/1/22 10:50, Eli Britstein wrote: >> >> >> >> >> >>> -----Original Message----- >> >>> From: Ilya Maximets <i.maxim...@ovn.org> >> >>> Sent: Monday, 31 October 2022 23:54 >> >>> To: Donald Sharp <donaldshar...@gmail.com>; ovs- >> >>> disc...@openvswitch.org; e...@eecs.berkeley.edu; Eli Britstein >> >>> <el...@nvidia.com> >> >>> Cc: i.maxim...@ovn.org >> >>> Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu >> >>> >> >>> External email: Use caution opening links or attachments >> >>> >> >>> >> >>> On 10/31/22 17:25, Donald Sharp via discuss wrote: >> >>>> Hi! >> >>>> >> >>>> I work on the FRRouting project (https://frrouting/org >> >>> <https://frrouting/org> ) and am doing work with FRR and have noticed >> >>> that when I have a full BGP feed on a system that is also running >> >>> ovs-vswitchd that ovs-vswitchd sits at 100% cpu: >> >>>> >> >>>> top - 09:43:12 up 4 days, 22:53, 3 users, load average: 1.06, >> 1.08, 1.08 >> >>>> Tasks: 188 total, 3 running, 185 sleeping, 0 stopped, 0 zombie >> >>>> %Cpu(s): 12.3 us, 14.7 sy, 0.0 ni, 72.8 id, 0.0 wa, 0.0 hi, 0.2 >> si, 0.0 st >> >>>> MiB Mem : 7859.3 total, 2756.5 free, 2467.2 used, 2635.6 >> buff/cache >> >>>> MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 5101.9 >> avail Mem >> >>>> >> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM >> TIME+ >> >COMMAND >> >>>> 730 root 10 -10 146204 146048 11636 R 98.3 1.8 >> 6998:13 ovs- >> >vswitchd >> >>>> 169620 root 20 0 0 0 0 I 3.3 0.0 >> 1:34.83 kworker/0:3-events >> >>>> 21 root 20 0 0 0 0 S 1.3 0.0 >> 14:09.59 ksoftirqd/1 >> >>>> 131734 frr 15 -5 2384292 609556 6612 S 1.0 7.6 >> 21:57.51 zebra >> >>>> 131739 frr 15 -5 1301168 1.0g 7420 S 1.0 13.3 >> 18:16.17 bgpd >> >>>> >> >>>> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops >> >>>> running >> >>> at 100%: >> >>>> >> >>>> top - 09:48:12 up 4 days, 22:58, 3 users, load average: 0.08, >> 0.60, 0.89 >> >>>> Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie >> >>>> %Cpu(s): 0.2 us, 0.4 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.1 >> si, 0.0 st >> >>>> MiB Mem : 7859.3 total, 4560.6 free, 663.1 used, 2635.6 >> buff/cache >> >>>> MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 6906.1 >> avail Mem >> >>>> >> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM >> TIME+ >> >COMMAND >> >>>> 179064 sharpd 20 0 11852 3816 3172 R 1.0 0.0 >> 0:00.09 top >> >>>> 1037 zerotie+ 20 0 291852 113180 7408 S 0.7 1.4 >> 19:09.17 zerotier- >> >one >> >>>> 1043 Debian-+ 20 0 34356 21988 7588 S 0.3 0.3 >> 22:04.42 snmpd >> >>>> 178480 root 20 0 0 0 0 I 0.3 0.0 >> 0:01.21 kworker/1:2-events >> >>>> 178622 sharpd 20 0 14020 6364 4872 S 0.3 0.1 >> 0:00.10 sshd >> >>>> 1 root 20 0 169872 13140 8272 S 0.0 0.2 >> 2:33.26 systemd >> >>>> 2 root 20 0 0 0 0 S 0.0 0.0 >> 0:00.60 kthreadd >> >>>> >> >>>> I do not have any particular ovs configuration on this box: >> >>>> sharpd@janelle:~$ sudo ovs-vsctl show >> >>>> c72d327c-61eb-4877-b4e7-dcf7e07e24fc >> >>>> ovs_version: "2.13.8" >> >>>> >> >>>> >> >>>> sharpd@janelle:~$ sudo ovs-vsctl list o . >> >>>> _uuid : c72d327c-61eb-4877-b4e7-dcf7e07e24fc >> >>>> bridges : [] >> >>>> cur_cfg : 0 >> >>>> datapath_types : [netdev, system] >> >>>> datapaths : {} >> >>>> db_version : "8.2.0" >> >>>> dpdk_initialized : false >> >>>> dpdk_version : none >> >>>> external_ids : {hostname=janelle, >> rundir="/var/run/openvswitch", >> >>> system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"} >> >>>> iface_types : [erspan, geneve, gre, internal, ip6erspan, >> ip6gre, lisp, >> >>> patch, stt, system, tap, vxlan] >> >>>> manager_options : [] >> >>>> next_cfg : 0 >> >>>> other_config : {} >> >>>> ovs_version : "2.13.8" >> >>>> ssl : [] >> >>>> statistics : {} >> >>>> system_type : ubuntu >> >>>> system_version : "20.04" >> >>>> >> >>>> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m >> >>>> ovs-vswitchd: no datapaths exist >> >>>> ovs-vswitchd: datapath not found (Invalid argument) >> >>>> ovs-appctl: ovs-vswitchd: server returned an error >> >>>> >> >>>> Eli Britstein suggested I update ovs-openvswitch to latest and I did >> >>>> and saw the same behavior. When I pulled up the running code in a >> >>> debugger I see that ovs-vswitchd is running in this loop below pretty >> >>> much 100% of the time: >> >>>> >> >>>> (gdb) f 4 >> >>>> #4 0x0000559498b4e476 in route_table_run () at lib/route-table.c:133 >> >>>> 133 nln_run(nln); >> >>>> (gdb) l >> >>>> 128 OVS_EXCLUDED(route_table_mutex) >> >>>> 129 { >> >>>> 130 ovs_mutex_lock(&route_table_mutex); >> >>>> 131 if (nln) { >> >>>> 132 rtnetlink_run(); >> >>>> 133 nln_run(nln); >> >>>> 134 >> >>>> 135 if (!route_table_valid) { >> >>>> 136 route_table_reset(); >> >>>> 137 } >> >>>> (gdb) l >> >>>> 138 } >> >>>> 139 ovs_mutex_unlock(&route_table_mutex); >> >>>> 140 } >> >>>> >> >>>> I pulled up where route_table_valid is set: >> >>>> >> >>>> 298 static void >> >>>> 299 route_table_change(const struct route_table_msg *change >> >>> OVS_UNUSED, >> >>>> 300 void *aux OVS_UNUSED) >> >>>> 301 { >> >>>> 302 route_table_valid = false; >> >>>> 303 } >> >>>> >> >>>> >> >>>> If I am reading the code correctly, every RTM_NEWROUTE netlink >> >>>> message that ovs-vswitchd is getting is setting the >> >>>> route_table_valid global variable to >> >>> false and causing route_table_reset() to be run. >> >>>> This makes sense in context of what FRR is doing. A full BGP feed >> >>>> *always* has churn. So ovs-vswitchd is receiving. RTM_NEWROUTE >> >>>> message, parsing it and deciding in route_table_change() that the >> >>>> route table is no longer valid and causing it to call >> >>>> route_table_reset() which >> >>> redumps the entire routing table to ovs-vswitchd. In this case there >> >>> are ~115k >> >>> ipv6 routes in the linux fib. >> >>>> >> >>>> I hesitate to make any changes here since I really don't understand >> >>>> what the >> >>> end goal here is. >> >>>> ovs-vswitchd is receiving a route change from the kernel but is in >> >>>> turn causing it to redump the entire routing table again. What >> >>>> should be the >> >>> correct behavior be from ovs-vswitchd's perspective here? >> >>> >> >>> Hi, Donald. >> >>> >> >>> Your analysis is correct. OVS will invalidate the cached routing >> >>> table and re- dump it in full on the next access on each netlink >> >>> notification about route changes. >> >>> >> >>> Looking back into commit history, OVS did maintain the cache and only >> >>> added/removed what was in the netlink message incrementally. >> >>> But that changed in 2011 with the following commit: >> >>> >> >>> commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884 >> >>> Author: Ethan J. Jackson <e...@eecs.berkeley.edu> >> >>> Date: Thu Jan 13 16:29:31 2011 -0800 >> >>> >> >>> route-table: Handle route updates more robustly. >> >>> >> >>> The kernel does not broadcast rtnetlink route messages in all cases >> >>> one would expect. This can cause stale entires to end up in the >> >>> route table which may cause incorrect results for >> >>> route_table_get_ifindex() queries. This commit causes rtnetlink >> >>> route messages to dump the entire route table on the next >> >>> route_table_get_ifindex() query. >> >>> >> >>> And indeed, looking at the history of attempts of different projects >> >>> to use route notifications, they all are facing issues and it seems >> >>> like none of them is actually able to fully correctly handle all the >> >>> notifications, just because these notifications are notoriously bad. >> >>> It seems to be impossible in certain cases to tell what exactly >> >>> changed and how. There could be duplicates or missing notifications. >> >>> And the code of projects that are trying to maintain a route cache in >> >>> userspace is insanely complex and doesn't handle 100% of cases anyway. >> >>> >> >>> There were attempts to convince kernel developers to add unique >> >>> identifiers to routes, so userspace can tell them apart, but all of >> >>> them seems to die leaving the problem unresolved. >> >>> >> >>> These are some discussions/bugs that I found: >> >>> >> >>> >> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug >> >>> zil >> >>> >> >la.redhat.com%2Fshow_bug.cgi%3Fid%3D1337855&data=05%7C01%7Celi >> >>> >> >br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273 >> >>> >> >40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT >> >>> >> >WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ >> >>> >> >XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LWOW4uNIhpSbEtBBVlhyy0 >> >>> TiPyKXYxXv%2B%2Fwppp5bMpM%3D&reserved=0 >> >>> >> >>> >> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug >> >>> zil >> >>> >> >la.redhat.com%2Fshow_bug.cgi%3Fid%3D1722728&data=05%7C01%7Celi >> >>> >> >br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273 >> >>> >> >40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT >> >>> >> >WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ >> >>> >> >XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vOfVjOADZpRIt1mEIj9ygrkD >> >>> UE2k4paCTiAB51Nj97w%3D&reserved=0 >> >>> >> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit >> >>> hu >> >>> >> >b.com%2Fthom311%2Flibnl%2Fissues%2F226&data=05%7C01%7Celibr%4 >> >>> >> >0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b >> >>> >> >7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp >> >>> >> >bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI >> >>> >> >6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BCO0Ns6HTfiqjHYb3M6rHTh >> >>> W7d01OtMAkcAqWDnQwVE%3D&reserved=0 >> >>> >> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit >> >>> hu >> >>> >> >b.com%2Fthom311%2Flibnl%2Fissues%2F224&data=05%7C01%7Celibr%4 >> >>> >> >0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b >> >>> >> >7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp >> >>> >> >bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI >> >>> >> >6Mn0%3D%7C3000%7C%7C%7C&sdata=p9rviRFrZjayuCmcfn4jij8lRWwTb >> >>> 0Jsy6eeN5UfUJ0%3D&reserved=0 >> >>> >> >>> None of the bugs seems to be resolved. Most are closed for >> >>> non-technical reasons. >> >>> >> >>> I suppose, Ethan just decided to not deal with that horribly >> >>> unreliable kernel interface and just re-dump the route table on >> changes. >> >>> >> >>> >> >>> For your actual problem here, I'm not sure if we can fix it that >> easily. >> >>> >> >>> Is it necessary for OVS to know about these routes? >> >>> If no, it might be possible to isolate them in a separate network >> >>> namespace, so OVS will not receive all the route updates? >> >>> >> >>> Do you know how long it takes to dump a route table once? >> >>> Maybe it worth limiting that process to only dump once a second or >> >>> once in a few seconds. That should alleviate the load if the actual >> >>> dump is relatively fast. >> >> In this setup OVS just runs without any use. There is no datapath (no >> >bridges/ports) configured. It is useless to run this mechanism at all >> for it. >> >> We can bind this mechanism to at least one datapath is configured (or >> even >> >only when there is at least one tunnel configured). >> >> What do you think? >> > >> >Hmm. Why don't you just stop/disable the service then? >> Indeed, that's possible. It's just turned on by default in this system >> (Debian) and Donald noticed the CPU consumption. >> > >> >>> >> >>> Best regards, Ilya Maximets. >> >> _______________________________________________ >> discuss mailing list >> disc...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >> > > > *‘Esta mensagem é direcionada apenas para os endereços constantes no > cabeçalho inicial. Se você não está listado nos endereços constantes no > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão > imediatamente anuladas e proibidas’.* > > *‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para > assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não > poderá aceitar a responsabilidade por quaisquer perdas ou danos causados > por esse e-mail ou por seus anexos’.* >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss