Re: [ovs-discuss] Scaling OVN/Southbound

Dan Williams via discuss Tue, 23 May 2023 08:27:55 -0700

On Tue, 2023-05-23 at 13:59 +0000, Felix Hüttner via discuss wrote:
> Hi everyone,
> 
> we are currently running an OVN Deployment with 450 Nodes. We run a 3
> node cluster for the northbound database and a 3 nodes cluster for
> the southbound database.
> Between the southbound cluster and the ovn-controllers we have a
> layer of 24 ovsdb relays.
> The setup is using TLS for all connections, however the TLS Server is
> handled by a traefik reverseproxy to offload this from the ovsdb
> Northd and Neutron is connecting directly to north- and southbound
> databases without the relays.
> 
> We needed to increase various timeouts on the ovsdb-server and client
> side to get this to a mostly stable state:
> * inactivity probes of 60 seconds (for all connections between ovsdb-
> server, relay and clients)
> * cluster election time of 50 seconds
> 
> As long as none of the relays restarts the environment is quite
> stable.
> However we see quite regularly the "Unreasonably long xxx ms poll
> interval" messages ranging from 1000ms up to 40000ms.


I probably missed it from previous messages, but:

1) are your ovn-controllers using conditional monitoring for the SB, or
monitor-all?

2) what OVS version are your DB servers?

Dan

> 
> If a large amount of relays restart simultaneously they can also
> bring the ovsdb cluster to fail as the poll interval exceeds the
> cluster election time.
> This happens with the relays already syncing the data from all 3
> ovsdb servers.
> 
> We would like to improve this significantly to ensure on the one hand
> that our ovsdb clusters will survive unplanned load without issues
> and on the other hand to keep the poll intervals short.
> We would like to ensure a short poll interval to allow us to act on
> distributed-gateway-ports failovers and failover of virtual port in a
> timely manner (ideally below 1 second).
> 
> To do this we found the following solutions that were discussed in
> the past:
> 1. Implementing multithreading for ovsdb
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=&submitter=&state=*&q=multithreading&archive=&delegate=
> 2. Changing the storage backend of OVN to an alternative (e.g. etcd)
> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html
> 
> Both of these discussion are from 2016, not sure if more up-to-date
> ones exist.
> 
> I would like to ask if there are already existing discussions on
> scaling ovsdb further/faster?
> 
> > From my perspective whatever such a solution might be, would no
> > longer require relays and allow the ovsdb servers to handle load
> > gracefully.
> I personally see that multithreading for ovsdb sounds quite
> promising, as that would allow us to separate the raft/cluster
> communication from the client connections.
> This should allow us to keep the cluster healthly even under
> significant pressure of clients.
> 
> Thank you
> 
> --
> Felix Huettner
> 
> Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur
> für die Verwertung durch den vorgesehenen Empfänger bestimmt.
> Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den
> Absender bitte unverzüglich in Kenntnis und löschen diese E Mail.
> 
> Hinweise zum Datenschutz finden Sie
> hier<https://www.datenschutz.schwarz>.
> 
> 
> This e-mail may contain confidential content and is intended only for
> the specified recipient/s.
> If you are not the intended recipient, please inform the sender
> immediately and delete this e-mail.
> 
> Information on data protection can be found
> here<https://www.datenschutz.schwarz>.
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
> 

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Scaling OVN/Southbound

Reply via email to