On Tue, 2023-05-23 at 13:59 +0000, Felix Hüttner via discuss wrote: > Hi everyone, > > we are currently running an OVN Deployment with 450 Nodes. We run a 3 > node cluster for the northbound database and a 3 nodes cluster for > the southbound database. > Between the southbound cluster and the ovn-controllers we have a > layer of 24 ovsdb relays. > The setup is using TLS for all connections, however the TLS Server is > handled by a traefik reverseproxy to offload this from the ovsdb > Northd and Neutron is connecting directly to north- and southbound > databases without the relays. > > We needed to increase various timeouts on the ovsdb-server and client > side to get this to a mostly stable state: > * inactivity probes of 60 seconds (for all connections between ovsdb- > server, relay and clients) > * cluster election time of 50 seconds > > As long as none of the relays restarts the environment is quite > stable. > However we see quite regularly the "Unreasonably long xxx ms poll > interval" messages ranging from 1000ms up to 40000ms.
I probably missed it from previous messages, but: 1) are your ovn-controllers using conditional monitoring for the SB, or monitor-all? 2) what OVS version are your DB servers? Dan > > If a large amount of relays restart simultaneously they can also > bring the ovsdb cluster to fail as the poll interval exceeds the > cluster election time. > This happens with the relays already syncing the data from all 3 > ovsdb servers. > > We would like to improve this significantly to ensure on the one hand > that our ovsdb clusters will survive unplanned load without issues > and on the other hand to keep the poll intervals short. > We would like to ensure a short poll interval to allow us to act on > distributed-gateway-ports failovers and failover of virtual port in a > timely manner (ideally below 1 second). > > To do this we found the following solutions that were discussed in > the past: > 1. Implementing multithreading for ovsdb > https://patchwork.ozlabs.org/project/openvswitch/list/?series=&submitter=&state=*&q=multithreading&archive=&delegate= > 2. Changing the storage backend of OVN to an alternative (e.g. etcd) > https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html > > Both of these discussion are from 2016, not sure if more up-to-date > ones exist. > > I would like to ask if there are already existing discussions on > scaling ovsdb further/faster? > > > From my perspective whatever such a solution might be, would no > > longer require relays and allow the ovsdb servers to handle load > > gracefully. > I personally see that multithreading for ovsdb sounds quite > promising, as that would allow us to separate the raft/cluster > communication from the client connections. > This should allow us to keep the cluster healthly even under > significant pressure of clients. > > Thank you > > -- > Felix Huettner > > Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur > für die Verwertung durch den vorgesehenen Empfänger bestimmt. > Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den > Absender bitte unverzüglich in Kenntnis und löschen diese E Mail. > > Hinweise zum Datenschutz finden Sie > hier<https://www.datenschutz.schwarz>. > > > This e-mail may contain confidential content and is intended only for > the specified recipient/s. > If you are not the intended recipient, please inform the sender > immediately and delete this e-mail. > > Information on data protection can be found > here<https://www.datenschutz.schwarz>. > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss