Thank you for your answers and opinions. We have already started working
along these lines.

Han Zhou, it's exactly the topology  you described. Each zone needs to
interconnect with its counterparts in the other 2 zones.

Regards,

On Mon, Apr 14, 2025 at 2:07 AM Han Zhou <hz...@ovn.org> wrote:

>
>
> On Thu, Apr 10, 2025 at 12:25 AM Dumitru Ceara <dce...@redhat.com> wrote:
> >
> > On 4/9/25 5:58 PM, Numan Siddique wrote:
> > > On Tue, Apr 8, 2025 at 5:57 PM Paulo Guilherme Da Silva via discuss <
> > > ovs-discuss@openvswitch.org> wrote:
> > >
> > >> Hi everyone,
> >
> > Hi all,
> >
> > >>
> > >> I wrote this email to share with the community the behavior we are
> > >> observing in our infrastructure, the high processing of ovn-ic.
> > >>
> > >> We can simulate the behavior using ovn-fake-multinode running in a
> > >> sandbox. At the moment we're using 24.03 OVN version.
> > >>
> > >> How you can see, we have 3 zones
> > >>
> > >> root@vm-se1-paulo:~/ovn-fake-multinode# podman ps
> > >> CONTAINER ID  IMAGE                                COMMAND
> CREATED
> > >>     STATUS         PORTS       NAMES
> > >> 15bb7e2d21db  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-central-az1-1
> > >> 8c21baf990b8  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-central-az2-1
> > >> 54fc243cbb3c  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-central-az3-1
> > >> aac92051d8a3  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-1
> > >> c053e82326a7  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-2
> > >> 25705f7b100f  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-3
> > >> ebd07e74b2f8  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-4
> > >> 72f8c45178f8  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-5
> > >> 43ca78b73401  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-6
> > >> b055c8d42860  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-7
> > >> 7fea15004dd9  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-8
> > >> 0349d294cc07  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-9
> > >> 2fa3d537a506  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-10
> > >> 26c07aff9b78  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-11
> > >> 83210fb30a91  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-gw-12
> > >> b4dff8b37518  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-chassis-1
> > >> 606655db8d8b  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-chassis-2
> > >> d45da63d8713  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-chassis-3
> > >> 4b960252e7a3  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-chassis-4
> > >> 56ecfdbd4580  localhost/ovn/ovn-multi-node:latest  /usr/sbin/init  9
> days
> > >> ago  Up 9 days ago              ovn-chassis-5
> > >>
> > >>
> > >> We currently have 3000 routers deployed in each zone of our sdn. And
> with
> > >> this value since we can see load and the impact on ovn-ic daemon
> processing.
>
> Could you describe more about your topology? Does each router of each zone
> need to interconnect with its counterparts in other 2 zones? If that's the
> requirement, then yes the current simple recompute loop of ovn-ic may not
> scale. And I agree incremental-processing is the most appropriate solution.
>
> Best,
> Han
>
> > >>
> > >> 1. Even when we don't have new resources being processed, the cpu load
> > >> fluctuantes between 80% and 99% of cpu time, all the time.
> > >>
> > >> 2. When we created new resources, the load got close in 99% of time
> cpu,
> > >> until the end of new deployments.
> > >>
> > >> Our concern is that ovn-ic will not be able to scale to future demand,
> > >> since the number of routers is expected to grow in the coming months.
> > >>
> > >> We build version with symbols and frame-pointer enable and we use it
> in
> > >> conjunction with the perf tool to understand the situation.
> > >> # perf record -p $(pidof ovn-ic) -g --call-graph dwarf
> > >>
> > >> while a script is creating new resources, we capture the prof
> analysis and
> > >> as a result we obtained
> > >> # perf report -g
> > >>
> > >> Samples: 53K of event 'cpu-clock:pppH', Event count (approx.):
> 13339250000
> > >>   Children      Self  Command  Shared Object      Symbol
> > >> +   99.95%     1.24%  ovn-ic   ovn-ic             [.] main
> > >> +   99.93%     0.00%  ovn-ic   ovn-ic             [.] _start
> > >> +   99.93%     0.00%  ovn-ic   libc.so.6          [.]
> __libc_start_main
> > >> +   99.93%     0.00%  ovn-ic   libc.so.6          [.]
> 0x00007f6ba2cebd8f
> > >> +   58.40%     2.01%  ovn-ic   ovn-ic             [.]
> > >> ovsdb_idl_index_generic_comparer.part.0
> > >> +   58.34%     0.04%  ovn-ic   ovn-ic             [.] skiplist_find
> > >> +   57.82%     4.93%  ovn-ic   ovn-ic             [.]
> skiplist_forward_to_
> > >> +   57.82%     0.00%  ovn-ic   ovn-ic             [.]
> skiplist_forward_to
> > >> (inlined)
> > >> +   46.84%    10.29%  ovn-ic   ovn-ic             [.]
> > >> ovsdb_datum_compare_3way
> > >> +   38.25%     0.01%  ovn-ic   ovn-ic             [.]
> ovsdb_idl_index_find
> > >> +   37.93%     1.25%  ovn-ic   ovn-ic             [.] port_binding_run
> > >> +   20.33%     6.87%  ovn-ic   ovn-ic             [.]
> > >> ovsdb_atom_compare_3way
> > >> +   20.10%     0.01%  ovn-ic   ovn-ic             [.]
> > >> ovsdb_idl_cursor_first_eq
> > >> +   15.92%     0.02%  ovn-ic   ovn-ic             [.]
> > >> get_lrp_name_by_ts_port_name
> > >> +   13.44%    13.38%  ovn-ic   ovn-ic             [.] json_string
> > >> +    9.97%     0.20%  ovn-ic   ovn-ic             [.] ip46_parse_cidr
> > >> +    9.55%     9.49%  ovn-ic   ovn-ic             [.] ovsdb_idl_read
> > >> +    8.40%     0.00%  ovn-ic   libc.so.6          [.]
> 0x00007f6ba2e73806
> > >> +    8.37%     8.37%  ovn-ic   libc.so.6          [.]
> 0x00000000001b1806
> > >> +    7.53%     0.19%  ovn-ic   ovn-ic             [.]
> ip_parse_masked_len
> > >> +    7.32%     0.05%  ovn-ic   ovn-ic             [.] ip_parse_cidr
> > >> +    6.88%     4.64%  ovn-ic   ovn-ic             [.] smap_find__
> > >> +    6.79%     0.32%  ovn-ic   ovn-ic             [.] ovs_scan_len
> > >> +    6.46%     4.75%  ovn-ic   ovn-ic             [.] ovs_scan__
> > >> +    6.35%     0.03%  ovn-ic   ovn-ic             [.]
> > >> ovsdb_idl_cursor_next_eq
> > >> +    3.71%     0.09%  ovn-ic   ovn-ic             [.] smap_get
> > >> +    2.59%     0.04%  ovn-ic   ovn-ic             [.] smap_get_uuid
> > >> +    2.26%     0.06%  ovn-ic   ovn-ic             [.] ipv6_parse_cidr
> > >> +    2.16%     0.10%  ovn-ic   ovn-ic             [.]
> ipv6_parse_masked_len
> > >> +    2.16%     0.05%  ovn-ic   ovn-ic             [.] xasprintf
> > >> +    2.11%     0.16%  ovn-ic   ovn-ic             [.] xvasprintf
> > >> +    2.08%     0.12%  ovn-ic   ovn-ic             [.] ts_run
> > >> +    1.88%     0.00%  ovn-ic   libc.so.6          [.]
> 0x00007f6ba2e73b7e
> > >> +    1.87%     1.87%  ovn-ic   libc.so.6          [.]
> 0x00000000001b1b7e
> > >> +    1.87%     1.78%  ovn-ic   ovn-ic             [.] hash_bytes
> > >> +    1.66%     0.00%  ovn-ic   ovn-ic             [.]
> extract_lsp_addresses
> > >> +    1.66%     0.01%  ovn-ic   ovn-ic             [.]
> > >> parse_and_store_addresses
> > >>
> > >> In attached I share the result increasing  the zoom in on functions
> that
> > >> consume the most CPU time
> > >>
> > >> In each cycle of the loop, it goes through these 4 main functions
> that in
> > >> turn iterate over the main tables of the ovnsb_idl, ovnnb_idl,
> ovnisb_idl
> > >> and ovninb_idl. Following the concepts of Big O notation, the larger
> the
> > >> tables, the greater the processing consumption. We believe that this
> is
> > >> what we are seeing here.
> > >>
> > >> static void
> > >> ovn_db_run(struct ic_context *ctx,
> > >>            const struct icsbrec_availability_zone *az)
> > >> {
> > >>     ts_run(ctx);
> > >>     gateway_run(ctx, az);
> > >>     port_binding_run(ctx, az);
> > >>     route_run(ctx, az);
> > >> }
> > >>
> > >> To resolve the first behavior we have worked trying improve the
> > >> performance in this event loop in the main function of the process.,
> we
> > >> apply a check to the state_change_idl->last_ovnsb_seqno attribute
> comparing
> > >> the current value with the last state to execute the loop only at
> times of
> > >> change and this approach proved to be efficient.
> > >>
> > >> Now, regarding the second behavior described above, remembering that
> > >> currently the ovn-ic process is single-thread, the solution is more
> > >> complex. I think the correct way to solve this scalability issue
> would be
> > >> to implement incremental processing before proposing a multi-thread
> system.
> > >>
> > >
> > > I think adding incremental processing (I-P) support seems to be the
> right
> > > way to go.  Adding I-P should address the first concern too IMO.  But
> you
> > > can definitely submit a patch to address it and we can discuss it in
> the
> > > patch.
> > >
> >
> > I agree, it seems better to me to try to improve the processing step
> > instead of trying to throw threads at the problem.
> >
> > > For the OVN community I think adding I-P for ovn-ic was not a priority.
> > > Probably that's the case with many of the deployments.  If you want to
> add
> > > I-P to ovn-ic,  I have no objections.  You have to do the heavy lifting
> > > though :)
> > >
> > > @Dumitru Ceara <dce...@redhat.com> @Mark Michelson <
> mmich...@redhat.com>  @Han
> > > Zhou <hz...@ovn.org>   Thoughts ?
> > >
> >
> > Indeed, the performance of the ovn-ic daemon wasn't really a priority
> > until now.  That being said, I'm available to try to answer questions or
> > troubleshoot issues that might arise while implementing incremental
> > processing for ovn-ic.
> >
> >
> > > Thanks
> > > Numan
> > >
> > > We would like to hear your thoughts on this matter and whether we are
> > >> approaching the topic correctly. Please let us know if there are any
> other
> > >> debugging commands that would help us with this investigation.
> > >>
> > >> Thank you in advance
> > >>
> > >> --
> > >> *Paulo Guilherme da Silva*
> > >> IaaS - Networking
> > >> guilherme.pa...@luizalabs.com
> > >>
> >
> > Regards,
> > Dumitru
>


-- 
*Paulo Guilherme da Silva*
IaaS - Networking
guilherme.pa...@luizalabs.com

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to