On 5/8/24 18:01, Numan Siddique wrote: > On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss < > ovs-discuss@openvswitch.org> wrote: > >> Hello everyone >> >> >> >> In some aspect it might be considered as a continuation of this thread: >> (link1), yet it is different >> >> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found >> increase in performance in 3-4 times >> >> And yet still we do observe high CPU load for NorthD process; taking >> deeper into the logs we have found: >> >> >> > > Thanks for reporting this issue. > > > 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd >> 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU >> usage) >> >> *2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute >> (missing handler for input SB_datapath_binding) took 52313ms* >> >> *2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute >> (failed handler for input northd) took 7759ms* >> >> *2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms >> poll interval (56201ms user, 2900ms system)* >> >> >> >> As you can see there is a significant delay in 52 secs >>
This is huge indeed! >> Correct me please, if I am in the wrong, but IMU: ‘*missing handler for*’ >> – practically means absence of the inc-engine handler from some node (in >> this sample: *SB_datapath_binding*) >> > > That's correct. > > Before plunging into Development it would be great to clarify/adjust with >> Community’s position >> >> - Why there is not handler for this node? >> >> > Our approach has been to add a handler for any input change only if it is > frequent or if it can be easily handled. > We also have skipped adding handlers if it increases the code complexity. > Having said that I think we are open > to adding more handlers if it makes sense or if it results in scale > improvements. > > Right now we fall back to a full recompute of northd engine for any changes > to a logical switch or logical router. > Does your deployment create/delete logical switches/routers frequently ? > Is it possible to enable ovn debug logs > and share them ? I'm curious to know what are the changes to SB datapath > binding. > > Feel free to share your OVN NB and SB DBs if you're ok with it. I can > deploy those DBs and see why recompute is so expensive. > > > >> - Any particular reason for this or just the peculiarity of our >> installation highlighted this issue? >> >> > My guess is that your installation is frequently creating , deleting or > modifying logical switches or routers. > > >> - >> - Do you think there is a reason in implementing that handler? ( >> *SB_datapath_binding*) >> >> > I'm fine adding a handler if it helps in the scale. In our use cases, we > don't frequently create/delete the logical switches and routers > and hence it is ok to fall back to full recomputes for such changes. > > >> - >> >> >> >> Any ideas are highly appreciated. >> > > You're welcome to work on it and submit patches to add a handler for > SB_datapath_binding. > > @Dumitru Ceara <dce...@redhat.com> @Han Zhou <hz...@ovn.org> if you've any > reservations on adding more handlers please do comment here. > In general, especially if it fixes a scalability issue like this one, it's probably fine. In practice it depends a bit on how much complexity this would add to the code. But the best way to tell is to have a way to reproduce this, e.g., NB/SB databases and the NB/SB jsonrpc update that caused the recompute. Regards, Dumitru _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss