could be issue with keystore/trustore --- you may want to do keytool -- list -- validate the files/password; also do md5sum on files from 1 node in west and 1 node in east.check ssl port 7001 --- from 1 node in west --> telnet <node in east>:7001 (or custom port if you are not using default port) On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise <mcarl...@salesforce.com.INVALID> wrote: Subroto - both tools error; openssl errno 111 - which made me check bound ports on the c* node with encryption flipped. Port 9042 is not open (determined by netstat -ant). Looking at the log differences for when a node is started with/without encryption. Without encryption, I get a bunch of lines like: OutboundTcpConnection.java:561 - Handshaking version w/ IP And this happens after a line like Gossiper.java - Waiting for gossip to settle... with encryption toggled to 'dc', I don't see any of those lines; presumable b/c the gossiper is trying to start but doesn't. On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <sbarua...@yahoo.com.invalid> wrote:
Michael, Are you able to connect to any c* node via OpenSSL? Openssl s_client -connect <ip address >:9042 Cqlsh <ip address> —ssl Subroto On Aug 26, 2019, at 2:47 PM, Marc Selwan <marc.sel...@datastax.com> wrote: which exact version of OpenJDK are you using? Is it possible you don't have JCE on those nodes? (I believe more recent versions of Java 8 has this baked in so that might not be it) Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter Quick links | DataStax | Training | Documentation | Downloads On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <mcarl...@salesforce.com.invalid> wrote: I originally opened this issue on stackoverflow (https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception). However, I haven't gotten any responses in over a week. I'm going to post it here and maybe someone will have an idea on where I can look. We currently run a multi region cassandra cluster in AWS. It runs in four regions, 12 nodes per region. It runs without node to node encryption (or client encryption either). We are trying to enable inter datacenter node to node encryption. However, when we flip encryption over we get an exception that nodes are unable to gossip with any peers. It could possibly be that we didn't build our jks keystore/truststores correctly (more on how we built these files below). But, we additionally do not see intra datacenter communication working (which should be set to unencrypted communication). Additionally, cqlsh cannot connect to the node either; even though we have (by default) client_auth_required set to false. ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any peers at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4] INFO [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml Something to note is that this error message occurs after a few minutes of the node being up. (i.e. there is a delay between start up before this exception is thrown). Information about our cassandra setup cassandra version: 3.11.4 JDK version: openjdk-8. Linux: Ubuntu 18.04 (bionic). cassandra.yaml endpoint_snitch: Ec2MultiRegionSnitch server_encryption_options: internode_encryption: dc keystore: <omitted> keystore_password: <omitted> truststore: <omitted> truststore_password: <omitted> client_encryption_options: enabled: false cassandra-rackdc.properties prefer_local=true No obvious errors with SSH output When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject and Issuer were omitted on purpose). found key for : cassy-us-west-2 adding as trusted cert: Subject: ... Issuer: ... Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74 Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026 ... trigger seeding of SecureRandom done seeding SecureRandom Looking at Java SE SSL/TLS connection debugging, this looks correct. But to note, we see this series of messages (along with the RSA key signature output) repeated several times in rapid fire. We never observe any messages about the trust store being added; however that might be something that occurs only on client initiation (?) Additionally, we do see cassandra report that the Encrypted Messaging service has been started. INFO [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001 Doesn't appear to be a cassandra.yaml configuration problem We can bring the node back online by simply configuring internode_encryption: none. This action seems to rule out a broadcast_address or rpc_address configuration problem. How we built our keystore/truststores We followed the basic template datastax docs for preparing SSL certificates. One minor difference was that our private key and CSRs were generated using openssl. One per each region (we plan to share key/signed certs across nodes in regions). This was created using a command template as: openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256 The generated CSR was then signed by an internal root CA. Because we generated our files using openssl, we had to build our jks files by importing our certs into them. Commands to generate truststore We distribute this one file to all nodes. keytool -importcert -keystore generic-server-truststore.jks -alias rootCa -file rootCa.crt -noprompt -keypass omitted -storepass omitted Commands to generate keystore This was done one per region; but essentially we created a keystore with keytool, then deleted the key entry and then imported our key entry using keytool from a pkcs12 file. keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 2048 -dname "..." keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass omitted openssl pkcs12 -export -in signed_certs/${region}.pem -inkey keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12 keytool -importkeystore -deststorepass omitted -destkeystore cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12 keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt -noprompt -keypass omitted -storepass omitted Looking back at this, I don't remember why we used keytool to generate a keypair/keystore, then deleted and imported. I think it was because the keytool importkeystore command refused to run if the keystore didn't already exist. ca.crt and pem file The ca.crt file contains the root certificate and the intermediate certificate that was used to sign the CSR. The pem file contains the signed CSR returned to us, the intermediate cert, and the root CA (in that order). openssl verify ca.crt and pem openssl verify -CAfile ca.crt us-west-2.pem signed_certs/us-west-2.pem: OK Command output after enabling encryption nodetool status (output truncated) Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack ?N 52.44.11.221 ? 256 25.4% null 1c ... ?N 52.204.232.195 ? 256 23.2% null 1d Datacenter: us-west-2 ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack ?N 34.209.2.144 ? 256 26.5% null 2c UN 52.40.32.177 105.99 GiB 256 23.7% null 2c ?N 34.210.109.203 ? 256 24.7% null 2a ... With the online node being the node with encryption set. cqlsh to localhost cassy-node6:~$ cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")}) cqlsh to remote node Remote node is a node with encryption enabled cassy-node6:~$ cqlsh 10.0.2.7 Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})