Michael, Are you able to connect to any c* node via OpenSSL?
Openssl s_client -connect <ip address >:9042 Cqlsh <ip address> —ssl Subroto > On Aug 26, 2019, at 2:47 PM, Marc Selwan <marc.sel...@datastax.com> wrote: > > which exact version of OpenJDK are you using? Is it possible you don't have > JCE on those nodes? (I believe more recent versions of Java 8 has this baked > in so that might not be it) > > > Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter > > Quick links | DataStax | Training | Documentation | Downloads > > > >> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise >> <mcarl...@salesforce.com.invalid> wrote: >> >> I originally opened this issue on stackoverflow >> (https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception). >> >> >> However, I haven't gotten any responses in over a week. I'm going to post >> it here and maybe someone will have an idea on where I can look. >> >> We currently run a multi region cassandra cluster in AWS. It runs in four >> regions, 12 nodes per region. It runs without node to node encryption (or >> client encryption either). We are trying to enable inter datacenter node to >> node encryption. However, when we flip encryption over we get an exception >> that nodes are unable to gossip with any peers. >> >> It could possibly be that we didn't build our jks keystore/truststores >> correctly (more on how we built these files below). But, we additionally do >> not see intra datacenter communication working (which should be set to >> unencrypted communication). Additionally, cqlsh cannot connect to the node >> either; even though we have (by default) client_auth_required set to false. >> >> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception >> encountered during startup >> java.lang.RuntimeException: Unable to gossip with any peers >> at >> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) >> ~[apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) >> [apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) >> [apache-cassandra-3.11.4.jar:3.11.4] >> at >> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) >> [apache-cassandra-3.11.4.jar:3.11.4] >> INFO [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - >> Configuration location: file:/etc/cassandra/cassandra.yaml >> >> Something to note is that this error message occurs after a few minutes of >> the node being up. (i.e. there is a delay between start up before this >> exception is thrown). >> >> Information about our cassandra setup >> >> cassandra version: 3.11.4 >> JDK version: openjdk-8. >> Linux: Ubuntu 18.04 (bionic). >> >> cassandra.yaml >> >> endpoint_snitch: Ec2MultiRegionSnitch >> >> server_encryption_options: >> internode_encryption: dc >> keystore: <omitted> >> keystore_password: <omitted> >> truststore: <omitted> >> truststore_password: <omitted> >> >> client_encryption_options: >> enabled: false >> cassandra-rackdc.properties >> >> prefer_local=true >> No obvious errors with SSH output >> >> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" >> added to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject >> and Issuer were omitted on purpose). >> >> found key for : cassy-us-west-2 >> >> >> adding as trusted cert: >> >> >> Subject: ... >> >> >> Issuer: ... >> >> >> Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74 >> >> >> Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026 >> >> >> ... >> >> trigger seeding of SecureRandom >> done seeding SecureRandom >> Looking at Java SE SSL/TLS connection debugging, this looks correct. But to >> note, we see this series of messages (along with the RSA key signature >> output) repeated several times in rapid fire. We never observe any messages >> about the trust store being added; however that might be something that >> occurs only on client initiation (?) >> >> Additionally, we do see cassandra report that the Encrypted Messaging >> service has been started. >> >> INFO [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting >> Encrypted Messaging Service on SSL port 7001 >> Doesn't appear to be a cassandra.yaml configuration problem >> >> We can bring the node back online by simply configuring >> internode_encryption: none. This action seems to rule out a >> broadcast_address or rpc_address configuration problem. >> >> How we built our keystore/truststores >> >> We followed the basic template datastax docs for preparing SSL certificates. >> One minor difference was that our private key and CSRs were generated using >> openssl. One per each region (we plan to share key/signed certs across nodes >> in regions). This was created using a command template as: >> >> openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout >> cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256 >> The generated CSR was then signed by an internal root CA. Because we >> generated our files using openssl, we had to build our jks files by >> importing our certs into them. >> >> Commands to generate truststore >> >> We distribute this one file to all nodes. >> >> keytool -importcert >> -keystore generic-server-truststore.jks >> -alias rootCa >> -file rootCa.crt >> -noprompt >> -keypass omitted >> -storepass omitted >> Commands to generate keystore >> >> This was done one per region; but essentially we created a keystore with >> keytool, then deleted the key entry and then imported our key entry using >> keytool from a pkcs12 file. >> >> keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore >> cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 >> -keysize 2048 -dname "..." >> >> keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks >> -storepass omitted >> >> openssl pkcs12 -export -in signed_certs/${region}.pem -inkey >> keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12 >> >> keytool -importkeystore -deststorepass omitted -destkeystore >> cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12 >> >> keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt >> -noprompt -keypass omitted -storepass omitted >> Looking back at this, I don't remember why we used keytool to generate a >> keypair/keystore, then deleted and imported. I think it was because the >> keytool importkeystore command refused to run if the keystore didn't already >> exist. >> >> ca.crt and pem file >> >> The ca.crt file contains the root certificate and the intermediate >> certificate that was used to sign the CSR. The pem file contains the signed >> CSR returned to us, the intermediate cert, and the root CA (in that order). >> >> openssl verify ca.crt and pem >> >> openssl verify -CAfile ca.crt us-west-2.pem >> signed_certs/us-west-2.pem: OK >> Command output after enabling encryption >> >> nodetool status (output truncated) >> >> Datacenter: us-east >> >> =================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> ?N 52.44.11.221 ? 256 25.4% null >> 1c >> ... >> ?N 52.204.232.195 ? 256 23.2% null >> 1d >> Datacenter: us-west-2 >> >> ===================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> ?N 34.209.2.144 ? 256 26.5% null >> 2c >> UN 52.40.32.177 105.99 GiB 256 23.7% null >> 2c >> ?N 34.210.109.203 ? 256 24.7% null >> 2a >> ... >> With the online node being the node with encryption set. >> >> cqlsh to localhost >> >> cassy-node6:~$ cqlsh >> Connection error: ('Unable to connect to any servers', {'127.0.0.1': >> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: >> Connection refused")}) >> cqlsh to remote node Remote node is a node with encryption enabled >> >> cassy-node6:~$ cqlsh 10.0.2.7 >> Connection error: ('Unable to connect to any servers', {'10.0.2.7': >> error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection >> refused")})