[ovs-discuss] OVSDB in Kubernetes: raft cluster breaks when pod IPs change

Matthew Booth Thu, 09 Jul 2020 03:54:17 -0700

I'm running a 3-node ovsdb raft cluster in kubernetes without using
host networking, NET_ADMIN, or any special networking privileges. I'm
using a StatefulSet, so I have persistent storage and a persistent
network name. However, I don't have a persistent IP. I have studied 2
existing implementation of OVN including [1], but as they are both
focussed on providing SDN service to the cluster itself (which I'm
not: I'm just a regular tenant of the cluster), they both legitimately
use host networking and therefore don't suffer this issue.


[1] 
https://github.com/ovn-org/ovn-kubernetes/blob/master/dist/templates/ovnkube-db-raft.yaml.j2

I finally managed to test what happens when a pod's IP changes, and
the answer is: it breaks. Specifically, the logs are full of:

2020-07-09T10:09:16Z|06012|socket_util|ERR|Dropped 59 log messages in
last 59 seconds (most recently, 1 seconds ago) due to excessive rate
2020-07-09T10:09:16Z|06013|socket_util|ERR|6644:10.131.0.4: bind:
Cannot assign requested address
2020-07-09T10:09:16Z|06014|raft|WARN|Dropped 59 log messages in last
59 seconds (most recently, 1 seconds ago) due to excessive rate
2020-07-09T10:09:16Z|06015|raft|WARN|ptcp:6644:10.131.0.4: listen
failed (Cannot assign requested address)

The reason it can't bind to 10.131.0.4 is that it's no longer a local
IP address.

Note that this is binding the raft cluster port, not the client port.
I have clients connecting to a service IP, which is static. I can't
specifically test that it still works after the pod IPs change, but as
it worked before there's no reason to suspect it won't.

My first thought was to use service IPs for the raft cluster, too, but
if it wants to bind to its local cluster IP that's never going to
work, because the service IP is never a local IP address (traffic is
forwarded by an external service).

ovsdb-server is invoked in its container by ovn-ctl:

            exec /usr/share/openvswitch/scripts/ovn-ctl \
            --no-monitor \
            --db-nb-create-insecure-remote=yes \
            --db-nb-cluster-remote-addr="$(bracketify ${initialiser_ip})" \
            --db-nb-cluster-local-addr="$(bracketify ${LOCAL_IP})" \
            --db-nb-cluster-local-proto=tcp \
            --db-nb-cluster-remote-proto=tcp \
            --ovn-nb-log="-vconsole:${OVN_LOG_LEVEL} -vfile:off" \
            run_nb_ovsdb

initialiser_ip is the pod IP address of the pod which comes up first.
This is a bootstrapping thing, and afaik isn't relevant once the
cluster is initialised. It certainly doesn't appear in the command
line below. LOCAL_IP is the current ip address of this pod.
Surprisingly (to me), this doesn't appear in the ovsdb-server
invocation either. The actual invocation is:

ovsdb-server -vconsole:info -vfile:off
--log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/pod-run/ovnsb_db.sock --pidfile=/pod-run/ovnsb_db.pid
--unixctl=ovnsb_db.ctl
--remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
--remote=ptcp:6642:0.0.0.0 /var/lib/openvswitch/ovnsb_db.db

So it's getting its former IP address from somewhere. As the only
local state is the database itself, I assume it's reading it from the
DB's cluster table. Here's what it currently thinks about cluster
state:

# ovs-appctl -t /pod-run/ovnsb_db.ctl cluster/status OVN_Southbound
83c7
Name: OVN_Southbound
Cluster ID: 1524 (1524187a-8a7b-41d5-89cf-ad2d00141258)
Server ID: 83c7 (83c771fd-d866-4324-bdd6-707c1bf72010)
Address: tcp:10.131.0.4:6644
Status: cluster member
Role: candidate
Term: 41039
Leader: unknown
Vote: self

Log: [5526, 5526]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: (->7f46) (->66fc)
Servers:
    83c7 (83c7 at tcp:10.131.0.4:6644) (self) (voted for 83c7)
    7f46 (7f46 at tcp:10.129.2.9:6644)
    66fc (66fc at tcp:10.128.2.13:6644)

This highlights the next problem, which is that both the other IPs
have changed, too. I know the new IP addresses of the other 2 cluster
nodes, although I don't know which one is 7f46 (but presumably it
knows). Even if I did know, presumably I can't modify the db while
it's not a member of the cluster anyway. The only way I can currently
think of to recover this situation is:

* Scale back the cluster to just node-0
* node-0 converts itself to a standalone db
* node-0 converts itself to a cluster db with a new local IP
* Scale the cluster back up to 3 nodes, initialised from node-0

I haven't tested this so there may be problems with it, but in any
case it's not a realistic solution.

A much nicer solution would be to use a service IP for the raft
cluster, but from the above error message I'm not expecting that to
work because it won't be able to bind it. I'm going to test this
today, and I'll update if I find to the contrary.

I guess I probably want to tell ovsdb to configure its cluster
identity with some arbitrary IP address that isn't local, then just
bind 0.0.0.0 and wait for traffic sent to its SID.

Thoughts?

Thanks,

Matt
-- 
Matthew Booth
Red Hat OpenStack Engineer, Compute DFG

Phone: +442070094448 (UK)

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

[ovs-discuss] OVSDB in Kubernetes: raft cluster breaks when pod IPs change

Reply via email to