Hello, We are trying to come up with a safe way to turn on internode (NOT client-server) TLS encryption on a cassandra cluster with two datacenters, anywhere from 3 to 20 nodes in each DC, 3+ racks in each DC. Cassandra version is 3.11.6, OS is CentOS 7. We have full control over cassandra configuration and operation, and a decent amount of control over client driver configuration. We're looking for a way to enable internode TLS with no period of time in which clients cannot connect to the cluster or clients can connect but receive inconsistent or incorrect data results.
Our understanding is that in 3.11, cassandra internode TLS encryption configuration (server_encryption_options::internode_encryption) can be set to none, all, dc, or rack, and "none" means the node will only send and receive unencrypted data, any other involves varying scope of only sending and receiving encrypted data; an "optional" setting only appears in the unreleased 4.0. The problem we run into is that no matter which scope we use, we end up with a period of time in which two different parts of the cluster won't be able to talk to each other, and so clients might get different answers depending on which part they talk to. In this scenario, clients can be shifted to talk to only one DC for a limited time, but cannot transition directly from only communicating with one DC to only communicating to the other; some period of time must be spent communicating to both, however small, between those two states. Is there a way to do this while avoiding downtime and wrong-answer problems?