[ovs-discuss] Scaling OVN/Southbound

Felix Hüttner via discuss Tue, 23 May 2023 06:59:41 -0700

Hi everyone,

we are currently running an OVN Deployment with 450 Nodes. We run a 3 node 
cluster for the northbound database and a 3 nodes cluster for the southbound 
database.
Between the southbound cluster and the ovn-controllers we have a layer of 24 
ovsdb relays.
The setup is using TLS for all connections, however the TLS Server is handled 
by a traefik reverseproxy to offload this from the ovsdb
Northd and Neutron is connecting directly to north- and southbound databases 
without the relays.

We needed to increase various timeouts on the ovsdb-server and client side to
get this to a mostly stable state:
* inactivity probes of 60 seconds (for all connections between ovsdb-server,
relay and clients)
* cluster election time of 50 seconds

As long as none of the relays restarts the environment is quite stable.
However we see quite regularly the "Unreasonably long xxx ms poll interval"
messages ranging from 1000ms up to 40000ms.

If a large amount of relays restart simultaneously they can also bring the
ovsdb cluster to fail as the poll interval exceeds the cluster election time.
This happens with the relays already syncing the data from all 3 ovsdb servers.

We would like to improve this significantly to ensure on the one hand that our
ovsdb clusters will survive unplanned load without issues and on the other hand
to keep the poll intervals short.
We would like to ensure a short poll interval to allow us to act on
distributed-gateway-ports failovers and failover of virtual port in a timely
manner (ideally below 1 second).

To do this we found the following solutions that were discussed in the past:
1. Implementing multithreading for ovsdb
https://patchwork.ozlabs.org/project/openvswitch/list/?series=&submitter=&state=*&q=multithreading&archive=&delegate=
2. Changing the storage backend of OVN to an alternative (e.g. etcd)
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html

Both of these discussion are from 2016, not sure if more up-to-date ones exist.

I would like to ask if there are already existing discussions on scaling ovsdb
further/faster?

>From my perspective whatever such a solution might be, would no longer require
>relays and allow the ovsdb servers to handle load gracefully.
I personally see that multithreading for ovsdb sounds quite promising, as that
would allow us to separate the raft/cluster communication from the client
connections.
This should allow us to keep the cluster healthly even under significant
pressure of clients.

Thank you

--
Felix Huettner

Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.

This e-mail may contain confidential content and is intended only for the
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and
delete this e-mail.

Information on data protection can be found
here<https://www.datenschutz.schwarz>.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

[ovs-discuss] Scaling OVN/Southbound

Reply via email to