This is a multi-dc cluster with public IPs for the nodes and also addressed
with private IPs as well in AWS. The apps connect via java-driver to a
public IP.

When we built the 2.1.X cluster with ec2multiregionsnitch, the system.peers
table had public ips for the nodes in the rpc_address column.

After the upgrade from 2.1.x to 2.2.x, the java-driver and/or cassandra
appears to be resolving to internal private IPs when building the cluster
map on the client. The system.peers table now had internal/private IPs in
the rpc_address column.

Since the internal IPs are given when the client app connects to the
cluster, the client app cannot communicate with other nodes in other
datacenters. They seem to be able to communicate within its own datacenter
of the initial connection.

It appears we fixed this by manually updating the system.peers table's
rpc_address column back to the public IP. This appears to survive a restart
of the cassandra nodes without being switched back to private IPs.

The gossipinfo after the 2.1 --> 2.2 upgrade reports internal IPs for both
RPC_ADDRESS and INTERNAL_IP, while in the 2.1.x version of gossipinfo it
reported RPC_ADDRESS to be the public IP and INTERNAL_IP to the internal
ip.

Rolling restarts did not solve this either, only our manual updates to
system.peers.

Our cassandra.yaml (these parameters are the same in our confs for 2.1 and
2.2) has:

listen_address: internal aws vpc ip
rpc_address: 0.0.0.0
broadcast_rpc_address: internal aws vpc ip

Are there changes to ec2multiregionsnitch or the java-driver binary
protocol that requires additional changes? Did the resolution of addresses
based on cassandra yaml parameters change?

So for more reference, here is the system.peers table in initial state when
the cluster is 2.1.x, with the rpc_address showing public ips.

peer           | data_center | preferred_ip   | rack | release_version |
rpc_address    | schema_version
----------------+-------------+----------------+------+-----------------+----------------+--------------------------------------
public_ip_1    |     us-east | private_ip_1   |   1c |          2.1.9 |
public_ip_1    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_2    |     us-east | private_ip_2   |   1e |          2.1.9 |
public_ip_2    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_3    |     us-east | private_ip_3   |   1d |          2.1.9 |
public_ip_3    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_4    |     eu-west |           null |   1a |          2.1.9 |
public_ip_4    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_5    |     eu-west |           null |   1b |          2.1.9 |
public_ip_5    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_6    |     us-east | private_ip_6   |   1e |          2.1.9 |
public_ip_6    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_7    |     eu-west |           null |   1b |          2.1.9 |
public_ip_7    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_8    |     eu-west |           null |   1c |          2.1.9 |
public_ip_8    | 65398421-84e8-307f-ae52-f6da42ff70c3
public_ip_9    |     eu-west |           null |   1a |          2.1.9 |
public_ip_9    | 65398421-84e8-307f-ae52-f6da42ff70c3


THEN after we upgraded to 2.2.X, note the change of the addresses in
rpc_address to the private ones:

peer           | data_center | preferred_ip   | rack | release_version |
rpc_address    | schema_version
----------------+-------------+----------------+------+-----------------+----------------+--------------------------------------
public_ip_1    |     us-east | private_ip_1   |   1c |          2.2.13 |
private_ip_1   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_2    |     us-east | private_ip_2   |   1e |          2.2.13 |
private_ip_2   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_3    |     us-east | private_ip_3   |   1d |          2.2.13 |
private_ip_3   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_4    |     eu-west |           null |   1a |          2.2.13 |
private_ip_4   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_5    |     eu-west |           null |   1b |          2.2.13 |
private_ip_5   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_6    |     us-east | private_ip_6   |   1e |          2.2.13 |
private_ip_6   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_7    |     eu-west |           null |   1b |          2.2.13 |
private_ip_7   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_8    |     eu-west |           null |   1c |          2.2.13 |
private_ip_8   | 89b260c9-70c1-3119-b37a-30b464851c9f
public_ip_9    |     eu-west |           null |   1a |          2.2.13 |
private_ip_9   | 89b260c9-70c1-3119-b37a-30b464851c9f

So we had to manually update the system.peers rpc_address back to the
public ips to get the java-driver to work again.

Reply via email to