I'm running a 3-node ovsdb raft cluster in kubernetes without using host networking, NET_ADMIN, or any special networking privileges. I'm using a StatefulSet, so I have persistent storage and a persistent network name. However, I don't have a persistent IP. I have studied 2 existing implementation of OVN including [1], but as they are both focussed on providing SDN service to the cluster itself (which I'm not: I'm just a regular tenant of the cluster), they both legitimately use host networking and therefore don't suffer this issue.
[1] https://github.com/ovn-org/ovn-kubernetes/blob/master/dist/templates/ovnkube-db-raft.yaml.j2 I finally managed to test what happens when a pod's IP changes, and the answer is: it breaks. Specifically, the logs are full of: 2020-07-09T10:09:16Z|06012|socket_util|ERR|Dropped 59 log messages in last 59 seconds (most recently, 1 seconds ago) due to excessive rate 2020-07-09T10:09:16Z|06013|socket_util|ERR|6644:10.131.0.4: bind: Cannot assign requested address 2020-07-09T10:09:16Z|06014|raft|WARN|Dropped 59 log messages in last 59 seconds (most recently, 1 seconds ago) due to excessive rate 2020-07-09T10:09:16Z|06015|raft|WARN|ptcp:6644:10.131.0.4: listen failed (Cannot assign requested address) The reason it can't bind to 10.131.0.4 is that it's no longer a local IP address. Note that this is binding the raft cluster port, not the client port. I have clients connecting to a service IP, which is static. I can't specifically test that it still works after the pod IPs change, but as it worked before there's no reason to suspect it won't. My first thought was to use service IPs for the raft cluster, too, but if it wants to bind to its local cluster IP that's never going to work, because the service IP is never a local IP address (traffic is forwarded by an external service). ovsdb-server is invoked in its container by ovn-ctl: exec /usr/share/openvswitch/scripts/ovn-ctl \ --no-monitor \ --db-nb-create-insecure-remote=yes \ --db-nb-cluster-remote-addr="$(bracketify ${initialiser_ip})" \ --db-nb-cluster-local-addr="$(bracketify ${LOCAL_IP})" \ --db-nb-cluster-local-proto=tcp \ --db-nb-cluster-remote-proto=tcp \ --ovn-nb-log="-vconsole:${OVN_LOG_LEVEL} -vfile:off" \ run_nb_ovsdb initialiser_ip is the pod IP address of the pod which comes up first. This is a bootstrapping thing, and afaik isn't relevant once the cluster is initialised. It certainly doesn't appear in the command line below. LOCAL_IP is the current ip address of this pod. Surprisingly (to me), this doesn't appear in the ovsdb-server invocation either. The actual invocation is: ovsdb-server -vconsole:info -vfile:off --log-file=/var/log/openvswitch/ovsdb-server-sb.log --remote=punix:/pod-run/ovnsb_db.sock --pidfile=/pod-run/ovnsb_db.pid --unixctl=ovnsb_db.ctl --remote=db:OVN_Southbound,SB_Global,connections --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --remote=ptcp:6642:0.0.0.0 /var/lib/openvswitch/ovnsb_db.db So it's getting its former IP address from somewhere. As the only local state is the database itself, I assume it's reading it from the DB's cluster table. Here's what it currently thinks about cluster state: # ovs-appctl -t /pod-run/ovnsb_db.ctl cluster/status OVN_Southbound 83c7 Name: OVN_Southbound Cluster ID: 1524 (1524187a-8a7b-41d5-89cf-ad2d00141258) Server ID: 83c7 (83c771fd-d866-4324-bdd6-707c1bf72010) Address: tcp:10.131.0.4:6644 Status: cluster member Role: candidate Term: 41039 Leader: unknown Vote: self Log: [5526, 5526] Entries not yet committed: 0 Entries not yet applied: 0 Connections: (->7f46) (->66fc) Servers: 83c7 (83c7 at tcp:10.131.0.4:6644) (self) (voted for 83c7) 7f46 (7f46 at tcp:10.129.2.9:6644) 66fc (66fc at tcp:10.128.2.13:6644) This highlights the next problem, which is that both the other IPs have changed, too. I know the new IP addresses of the other 2 cluster nodes, although I don't know which one is 7f46 (but presumably it knows). Even if I did know, presumably I can't modify the db while it's not a member of the cluster anyway. The only way I can currently think of to recover this situation is: * Scale back the cluster to just node-0 * node-0 converts itself to a standalone db * node-0 converts itself to a cluster db with a new local IP * Scale the cluster back up to 3 nodes, initialised from node-0 I haven't tested this so there may be problems with it, but in any case it's not a realistic solution. A much nicer solution would be to use a service IP for the raft cluster, but from the above error message I'm not expecting that to work because it won't be able to bind it. I'm going to test this today, and I'll update if I find to the contrary. I guess I probably want to tell ovsdb to configure its cluster identity with some arbitrary IP address that isn't local, then just bind 0.0.0.0 and wait for traffic sent to its SID. Thoughts? Thanks, Matt -- Matthew Booth Red Hat OpenStack Engineer, Compute DFG Phone: +442070094448 (UK) _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss