Thanks Numan, in my production setup with openstack, I am running ovn northd, sb and nb db on single VM (22.03.3). Reason for evaluating this cluster based ovn setup is to reduce cpu utilization of 100% sometime we see in in our production that cause issues with neutron and ovn-controller connectivities.
2024-10-17T11:00:53.934Z|114596|poll_loop|INFO|wakeup due to 1-ms timeout at northd/inc-proc-northd.c:279 (59% CPU usage) 2024-10-17T11:00:53.942Z|114597|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (59% CPU usage) 2024-10-17T11:00:54.254Z|114598|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/ovn/ovnnb_db.sock) at lib/stream-fd.c:157 (59% CPU usage) 2024-10-17T11:00:54.572Z|114599|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (59% CPU usage) 2024-10-17T11:05:20.624Z|114600|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (66% CPU usage) 2024-10-17T11:05:20.954Z|114601|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (66% CPU usage) 2024-10-17T11:06:00.607Z|114602|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (86% CPU usage) 2024-10-17T11:06:02.607Z|114603|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (86% CPU usage) 2024-10-17T11:16:00.629Z|114604|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (77% CPU usage) 2024-10-17T11:16:01.629Z|114605|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (77% CPU usage) 2024-10-17T11:16:01.954Z|114606|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/ovn/ovnnb_db.sock) at lib/stream-fd.c:157 (77% CPU usage) 2024-10-17T11:16:02.282Z|114607|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (77% CPU usage) 2024-10-17T11:16:02.282Z|114608|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/ovn/ovnnb_db.sock) at lib/stream-fd.c:157 (77% CPU usage) 2024-10-17T11:16:02.282Z|114609|poll_loop|INFO|wakeup due to 0-ms timeout at northd/inc-proc-northd.c:279 (77% CPU usage) Does moving to cluster setup helps with such issues ? On Wed, Oct 16, 2024 at 7:53 PM Numan Siddique <num...@ovn.org> wrote: > On Wed, Oct 16, 2024 at 9:12 AM Ammad Syed via discuss > <ovs-discuss@openvswitch.org> wrote: > > > > Hi, > > > > I am testing ovn 3 nodes clustering with ssl setup on ovn 24.04.2. > > > > These are ovn options that I have set on node 1. > > > > OVN_CTL_OPTS=" \ > > --db-nb-create-insecure-remote=no \ > > --db-sb-create-insecure-remote=no \ > > --db-nb-addr=172.16.60.40 \ > > --db-sb-addr=172.16.60.40 \ > > --db-nb-cluster-local-addr=172.16.60.40 \ > > --db-nb-cluster-local-proto=ssl \ > > --db-sb-cluster-local-addr=172.16.60.40 \ > > --db-sb-cluster-local-proto=ssl \ > > --ovn-northd-nb-db=ssl:172.16.60.40:6641 \ > > --ovn-northd-sb-db=ssl:172.16.60.40:6642 \ > > --ovn-northd-nb-db=ssl:172.16.60.40:6641,ssl:172.16.60.41:6641,ssl: > 172.16.60.42:6641 \ > > --ovn-northd-sb-db=ssl:172.16.60.40:6642,ssl:172.16.60.41:6642,ssl: > 172.16.60.42:6642 \ > > --ovn-nb-db-ssl-key=/etc/ovn/ovn-cert/ovnnb-privkey.pem \ > > --ovn-nb-db-ssl-cert=/etc/ovn/ovn-cert/ovnnb-cert.pem \ > > --ovn-nb-db-ssl-ca-cert=/etc/ovn/ovn-cert/cacert.pem \ > > --ovn-sb-db-ssl-key=/etc/ovn/ovn-cert/ovnsb-privkey.pem \ > > --ovn-sb-db-ssl-cert=/etc/ovn/ovn-cert/ovnsb-cert.pem \ > > --ovn-sb-db-ssl-ca-cert=/etc/ovn/ovn-cert/cacert.pem \ > > --ovn-northd-ssl-key=/etc/ovn/ovn-cert/ovnnorthd-privkey.pem \ > > --ovn-northd-ssl-cert=/etc/ovn/ovn-cert/ovnnorthd-cert.pem \ > > --ovn-northd-ssl-ca-cert=/etc/ovn/ovn-cert/cacert.pem \ > > " > > > > On the second and third node I have used below options. > > > > OVN_CTL_OPTS=" \ > > --db-nb-create-insecure-remote=no \ > > --db-sb-create-insecure-remote=no \ > > --db-nb-addr=172.16.60.41 \ > > --db-sb-addr=172.16.60.41 \ > > --db-nb-cluster-local-addr=172.16.60.41 \ > > --db-nb-cluster-local-proto=ssl \ > > --db-sb-cluster-local-addr=172.16.60.41 \ > > --db-sb-cluster-local-proto=ssl \ > > --db-nb-cluster-remote-addr=172.16.60.40 \ > > --db-nb-cluster-remote-proto=ssl \ > > --db-sb-cluster-remote-addr=172.16.60.40 \ > > --db-sb-cluster-remote-proto=ssl \ > > --ovn-northd-nb-db=ssl:172.16.60.40:6641,ssl:172.16.60.41:6641,ssl: > 172.16.60.42:6641 \ > > --ovn-northd-sb-db=ssl:172.16.60.40:6642,ssl:172.16.60.41:6642,ssl: > 172.16.60.42:6642 \ > > --ovn-nb-db-ssl-key=/etc/ovn/ovn-cert/ovnnb-privkey.pem \ > > --ovn-nb-db-ssl-cert=/etc/ovn/ovn-cert/ovnnb-cert.pem \ > > --ovn-nb-db-ssl-ca-cert=/etc/ovn/ovn-cert/cacert.pem \ > > --ovn-sb-db-ssl-key=/etc/ovn/ovn-cert/ovnsb-privkey.pem \ > > --ovn-sb-db-ssl-cert=/etc/ovn/ovn-cert/ovnsb-cert.pem \ > > --ovn-sb-db-ssl-ca-cert=/etc/ovn/ovn-cert/cacert.pem \ > > --ovn-northd-ssl-key=/etc/ovn/ovn-cert/ovnnorthd-privkey.pem \ > > --ovn-northd-ssl-cert=/etc/ovn/ovn-cert/ovnnorthd-cert.pem \ > > --ovn-northd-ssl-ca-cert=/etc/ovn/ovn-cert/cacert.pem \ > > --ovn-northd-nb-db=ssl:172.16.60.41:6641 \ > > --ovn-northd-sb-db=ssl:172.16.60.41:6642 \ > > " > > > > Here is the cluster status. > > > > # ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound > > db6a > > Name: OVN_Northbound > > Cluster ID: 5502 (5502d208-61dc-4eee-bd15-dc0dc52bf379) > > Server ID: db6a (db6a618a-bf77-4f46-b08d-ebf15d538ee5) > > Address: ssl:172.16.60.42:6643 > > Status: cluster member > > Role: leader > > Term: 12 > > Leader: self > > Vote: self > > > > Last Election started 3584828 ms ago, reason: leadership_transfer > > Last Election won: 3584825 ms ago > > Election timer: 1000 > > Log: [2, 17] > > Entries not yet committed: 0 > > Entries not yet applied: 0 > > Connections: ->f588 ->1902 <-f588 <-1902 > > Disconnections: 3 > > Servers: > > f588 (f588 at ssl:172.16.60.40:6643) next_index=17 match_index=16 > last msg 75 ms ago > > db6a (db6a at ssl:172.16.60.42:6643) (self) next_index=15 > match_index=16 > > 1902 (1902 at ssl:172.16.60.41:6643) next_index=17 match_index=16 > last msg 75 ms ago > > > > The issue is I am seeing below logs in follower instances continuously. > > > > 2024-10-16T13:05:51.106Z|03078|ovsdb_cs|INFO|ssl:172.16.60.41:6641: > clustered database server is not cluster leader; trying another server > > 2024-10-16T13:05:51.106Z|03079|ovn_northd|INFO|ovn-northd lock acquired. > This ovn-northd instance is now active. > > 2024-10-16T13:05:51.106Z|03080|ovsdb_cs|INFO|ssl:172.16.60.41:6642: > clustered database server is not cluster leader; trying another server > > 2024-10-16T13:05:51.107Z|03081|ovn_northd|INFO|ovn-northd lock lost. > This ovn-northd instance is now on standby. > > 2024-10-16T13:05:59.116Z|03082|reconnect|INFO|ssl:172.16.60.41:6641: > connected > > 2024-10-16T13:05:59.118Z|03083|reconnect|INFO|ssl:172.16.60.41:6642: > connected > > 2024-10-16T13:05:59.118Z|03084|ovsdb_cs|INFO|ssl:172.16.60.41:6641: > clustered database server is not cluster leader; trying another server > > 2024-10-16T13:05:59.119Z|03085|ovn_northd|INFO|ovn-northd lock acquired. > This ovn-northd instance is now active. > > 2024-10-16T13:05:59.119Z|03086|ovsdb_cs|INFO|ssl:172.16.60.41:6642: > clustered database server is not cluster leader; trying another server > > 2024-10-16T13:05:59.119Z|03087|ovn_northd|INFO|ovn-northd lock lost. > This ovn-northd instance is now on standby. > > 2024-10-16T13:06:07.130Z|03088|reconnect|INFO|ssl:172.16.60.41:6641: > connected > > 2024-10-16T13:06:07.131Z|03089|reconnect|INFO|ssl:172.16.60.41:6642: > connected > > 2024-10-16T13:06:07.132Z|03090|ovsdb_cs|INFO|ssl:172.16.60.41:6641: > clustered database server is not cluster leader; trying another server > > 2024-10-16T13:06:07.132Z|03091|ovn_northd|INFO|ovn-northd lock acquired. > This ovn-northd instance is now active. > > 2024-10-16T13:06:07.133Z|03092|ovsdb_cs|INFO|ssl:172.16.60.41:6642: > clustered database server is not cluster leader; trying another server > > 2024-10-16T13:06:07.133Z|03093|ovn_northd|INFO|ovn-northd lock lost. > This ovn-northd instance is now on standby. > > > > These logs are being received in second and third node who are > followers. When I reboot node 1, RAFT elects a new leader in my case node 3 > is selected and these logs disappear from node 3. When the node 1 came back > in follower state, it does not have these logs. > > > > Is there anything to be concerned about ? or is it normal ? > > These logs are from ovn-northd and looks like you're running 3 > instances of ovn-northd. All these 3 instances will connect to the > leaders of the both > NB and SB DB clusters. And only one will be active and the other two > will be on standby. All the 3 ovn-northd instances will try to get an > OVSDB > lock from the SB DB cluster leader and only one will get it. I think > the logs you're seeing are normal. > > Thanks > Numan > > > > > -- > > Regards, > > > > Ammad > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > -- Regards, Syed Ammad Ali
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss