Re: [ovs-discuss] Scaling OVN/Southbound

Lucas Alvares Gomes via discuss Wed, 24 May 2023 01:48:58 -0700

Hi,

On Tue, May 23, 2023 at 5:42 PM Daniel Alvarez via discuss
<ovs-discuss@openvswitch.org> wrote:
>
> +Lucas
>
> > On 23 May 2023, at 17:25, Ilya Maximets via discuss 
> > <ovs-discuss@openvswitch.org> wrote:
> >
> > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> >> Hi everyone,
> >
> > Hi, Felix.
> >
> >>
> >> we are currently running an OVN Deployment with 450 Nodes. We run a 3 node 
> >> cluster for the northbound database and a 3 nodes cluster for the 
> >> southbound database.
> >> Between the southbound cluster and the ovn-controllers we have a layer of 
> >> 24 ovsdb relays.
> >> The setup is using TLS for all connections, however the TLS Server is 
> >> handled by a traefik reverseproxy to offload this from the ovsdb
> >
> > The very important part of the system description is what versions
> > of OVS and OVN are you using in this setup?  If it's not latest
> > 3.1 and 23.03, then it's hard to talk about what/if performance
> > improvements are actually needed.
> >
> >> Northd and Neutron is connecting directly to north- and southbound 
> >> databases without the relays.
> >
> > One of the big things that is annoying is that Neutron connects to
> > Southbound database at all.  There are some reasons to do that,
> > but ideally that should be avoided.
>
> We initiated an effort to connect only to the NB database. Lucas (CC’ed) is 
> working on it at the moment because the main piece of info we are missing is 
> the location of the ports. With this, we can probably stop connecting to the 
> SB database but we will move part of the problem to the NB (less of it 
> hopefully).
>


Thanks for raising this, Daniel.

The thread mentioned is this one:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-April/403635.html,
the conversation in this thread is relevant to this topic I believe.

With the idea above, we would be able to avoid ovn-bgp-agent [0]
connecting to the Southbound Database, the gateway port location
information is the last piece we need for that project.

As for Neutron itself, we still have some more work to do, especially
around the Chassis table. For example, at the moment it's the CMS job
to clean up orphan Chassis entries in the Southbound, ovn-controller
will not delete the record unless it was gracefully stopped (which is
not always the case, specially during hard failures).

Another example is the configuration passed to the CMS via the
"ovn-cms-options" which is only exposed in the Chassis table at the
moment. That's how we get information about GW nodes
("enable-chassis-as-gw"), AZs, etc...

We also do a few things around the Port_Binding/Datapath table (we
look for GW ports, local ports for the metadata agent, etc...). The
problem with the physical location of ports could indeed be solved by
expanding the work from the thread about. In the thread, Han mentioned
that perhaps we could explore having a "status/detail" column in the
LSPs that would hold this type information (hosting chassis, port
up/down, etc...) which CMS could consume. The more I think about it, I
think it's a great idea for CMSs.

[0] https://opendev.org/openstack/ovn-bgp-agent

>
>
> >  I know that in the past limiting
> > the number of metadata agents was one of the mitigation strategies
> > for scaling issues.  Also, why can't it connect to relays?  There
> > shouldn't be too many transactions flowing towards Southbound DB
> > from the Neutron.
> >
> >>
> >> We needed to increase various timeouts on the ovsdb-server and client side 
> >> to get this to a mostly stable state:
> >> * inactivity probes of 60 seconds (for all connections between 
> >> ovsdb-server, relay and clients)
> >> * cluster election time of 50 seconds
> >>
> >> As long as none of the relays restarts the environment is quite stable.
> >> However we see quite regularly the "Unreasonably long xxx ms poll 
> >> interval" messages ranging from 1000ms up to 40000ms.
> >
> > With latest versions of OVS/OVN the CPU usage on Southbound DB
> > servers without relays in our weekly 500-node ovn-heater runs
> > stays below 10% during the test phase.  No large poll intervals
> > are getting registered.
> >
> > Do you have more details on under which circumstances these
> > large poll intervals occur?
> >
> >>
> >> If a large amount of relays restart simultaneously they can also bring the 
> >> ovsdb cluster to fail as the poll interval exceeds the cluster election 
> >> time.
> >> This happens with the relays already syncing the data from all 3 ovsdb 
> >> servers.
> >
> > There was a performance issue with upgrades and simultaneous
> > reconnections, but it should be mostly fixed on the current master
> > branch, i.e. in the upcoming 3.2 release:
> >  
> > https://patchwork.ozlabs.org/project/openvswitch/list/?series=348259&state=*
> >
> >>
> >> We would like to improve this significantly to ensure on the one hand that 
> >> our ovsdb clusters will survive unplanned load without issues and on the 
> >> other hand to keep the poll intervals short.
> >> We would like to ensure a short poll interval to allow us to act on 
> >> distributed-gateway-ports failovers and failover of virtual port in a 
> >> timely manner (ideally below 1 second).
> >
> > These are good goals.  But are you sure they are not already
> > addressed with the most recent versions of OVS/OVN ?
> >
> >>
> >> To do this we found the following solutions that were discussed in the 
> >> past:
> >> 1. Implementing multithreading for ovsdb 
> >> https://patchwork.ozlabs.org/project/openvswitch/list/?series=&submitter=&state=*&q=multithreading&archive=&delegate=
> >
> > We moved the compaction process to a separate thread in 3.0.
> > This partially addressed the multi-threading topic.  General
> > handling of client requests/updates in separate threads will
> > require significant changes in the internal architecture, AFAICT.
> > So, I'd like to avoid doing that unless necessary.  So far we
> > were able to overcome almost all the performance challenges
> > with simple algorithmic changes instead.
> >
> >> 2. Changing the storage backend of OVN to an alternative (e.g. etcd) 
> >> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html
> >
> > There was an ovsdb-etcd project, but it didn't manage to provide
> > better performance in comparison with ovsdb-server.  So it was
> > ultimately abandoned: https://github.com/IBM/ovsdb-etcd
> >
> >>
> >> Both of these discussion are from 2016, not sure if more up-to-date ones 
> >> exist.
> >>
> >> I would like to ask if there are already existing discussions on scaling 
> >> ovsdb further/faster?
> >
> > This again comes to a question what versions you're using.  I'm
> > currently not aware of any major performance issues for ovsdb-server
> > on the most recent code, besides the conditional monitoring, which is
> > not entirely OVSDB server's issue.  And it is also likely to become
> > a bit better in 3.2:
> >  
> > https://patchwork.ozlabs.org/project/openvswitch/patch/20230518121425.550048-1-i.maxim...@ovn.org/
> >
> >>
> >> From my perspective whatever such a solution might be, would no longer 
> >> require relays and allow the ovsdb servers to handle load gracefully.
> >> I personally see that multithreading for ovsdb sounds quite promising, as 
> >> that would allow us to separate the raft/cluster communication from the 
> >> client connections.
> >> This should allow us to keep the cluster healthly even under significant 
> >> pressure of clients.
> >
> > Again, good goals.  I'm just not sure if we actually need to do
> > something or if they are already achievable with the most recent code.
> >
> > I understand that testing on prod is not an option, so it's unlikely
> > we'll have an accurate test.  But maybe you can participate in the
> > initiative [1] for creation of ovn-heater OpenStack scenarios that
> > might be close to workloads you have?  This way upstream will be able
> > to test your use-cases or at least something similar.
> >
> > Most of our current efforts are focused on ovn-kubernetes use-case,
> > because we don't have much details on how high-scale OpenStack deployments
> > look like.
> >
> > [1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404488.html
> >
> > Best regards, Ilya Maximets.
> >
> >>
> >> Thank you
> >>
> >> --
> >> Felix Huettner
> >
> > _______________________________________________
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Scaling OVN/Southbound

Reply via email to