could be issue with keystore/trustore --- you may want to do keytool -- list  
-- validate the files/password; also do md5sum on files from 1 node in west and 
1 node in east.check ssl port 7001 --- from 1 node in west --> telnet <node in 
east>:7001 (or custom port if you are not using default port)
    On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise 
<mcarl...@salesforce.com.INVALID> wrote:  
 
 Subroto -
both tools error; openssl errno 111 - which made me check bound ports on the c* 
node with encryption flipped.  Port 9042 is not open (determined by netstat 
-ant).  Looking at the log differences for when a node is started with/without 
encryption.  Without encryption, I get a bunch of lines like:
OutboundTcpConnection.java:561 - Handshaking version w/ IP
And this happens after a line like
Gossiper.java - Waiting for gossip to settle...
with encryption toggled to 'dc', I don't see any of those lines; presumable b/c 
the gossiper is trying to start but doesn't.
On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <sbarua...@yahoo.com.invalid> 
wrote:

Michael,
Are you able to connect to any c* node via OpenSSL?
Openssl s_client -connect <ip address >:9042
Cqlsh <ip address> —ssl 
Subroto 
On Aug 26, 2019, at 2:47 PM, Marc Selwan <marc.sel...@datastax.com> wrote:


which exact version of OpenJDK are you using? Is it possible you don't have JCE 
on those nodes? (I believe more recent versions of Java 8 has this baked in so 
that might not be it)

Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter 
  Quick links | DataStax | Training | Documentation | Downloads  



On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise 
<mcarl...@salesforce.com.invalid> wrote:


I originally opened this issue on stackoverflow 
(https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception).
  
However, I haven't gotten any responses in over a week.  I'm going to post it 
here and maybe someone will have an idea on where I can look.

We currently run a multi region cassandra cluster in AWS. It runs in four 
regions, 12 nodes per region. It runs without node to node encryption (or 
client encryption either). We are trying to enable inter datacenter node to 
node encryption. However, when we flip encryption over we get an exception that 
nodes are unable to gossip with any peers.

It could possibly be that we didn't build our jks keystore/truststores 
correctly (more on how we built these files below). But, we additionally do not 
see intra datacenter communication working (which should be set to unencrypted 
communication). Additionally, cqlsh cannot connect to the node either; even 
though we have (by default) client_auth_required set to false.
ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
~[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) 
~[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) 
~[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) 
[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
[apache-cassandra-3.11.4.jar:3.11.4]
INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
Configuration location: file:/etc/cassandra/cassandra.yaml


Something to note is that this error message occurs after a few minutes of the 
node being up. (i.e. there is a delay between start up before this exception is 
thrown).

Information about our cassandra setup

cassandra version: 3.11.4
JDK version: openjdk-8.
Linux: Ubuntu 18.04 (bionic).

cassandra.yaml
endpoint_snitch: Ec2MultiRegionSnitch

server_encryption_options:
  internode_encryption: dc
  keystore: <omitted>
  keystore_password: <omitted>
  truststore: <omitted>
  truststore_password: <omitted>

client_encryption_options:
  enabled: false

cassandra-rackdc.properties
prefer_local=true

No obvious errors with SSH output

When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added 
to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject and Issuer 
were omitted on purpose).
found key for : cassy-us-west-2                                                 
                                                                                
                                                                      
adding as trusted cert:                                                         
                                                                                
                                                                      
  Subject: ...                                                                  
                                                                                
    
  Issuer:  ...                                                                  
                                                                                
    
  Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74              
                                                                                
                                                                      
  Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026  

...

trigger seeding of SecureRandom
done seeding SecureRandom   

Looking at Java SE SSL/TLS connection debugging, this looks correct. But to 
note, we see this series of messages (along with the RSA key signature output) 
repeated several times in rapid fire. We never observe any messages about the 
trust store being added; however that might be something that occurs only on 
client initiation (?)

Additionally, we do see cassandra report that the Encrypted Messaging service 
has been started.
INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting 
Encrypted Messaging Service on SSL port 7001

Doesn't appear to be a cassandra.yaml configuration problem

We can bring the node back online by simply configuring internode_encryption: 
none. This action seems to rule out a broadcast_address or rpc_address 
configuration problem.

How we built our keystore/truststores

We followed the basic template datastax docs for preparing SSL certificates. 
One minor difference was that our private key and CSRs were generated using 
openssl. One per each region (we plan to share key/signed certs across nodes in 
regions). This was created using a command template as:
openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout 
cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256

The generated CSR was then signed by an internal root CA. Because we generated 
our files using openssl, we had to build our jks files by importing our certs 
into them.

Commands to generate truststore

We distribute this one file to all nodes.
keytool -importcert 
    -keystore generic-server-truststore.jks 
    -alias rootCa  
    -file rootCa.crt 
    -noprompt
    -keypass omitted 
    -storepass omitted 

Commands to generate keystore

This was done one per region; but essentially we created a keystore with 
keytool, then deleted the key entry and then imported our key entry using 
keytool from a pkcs12 file.
keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore 
cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize 
2048 -dname "..." 

keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass 
omitted

openssl pkcs12 -export -in signed_certs/${region}.pem -inkey 
keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12 

keytool -importkeystore -deststorepass omitted -destkeystore 
cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12 

keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt 
-noprompt -keypass omitted -storepass omitted 

Looking back at this, I don't remember why we used keytool to generate a 
keypair/keystore, then deleted and imported. I think it was because the keytool 
importkeystore command refused to run if the keystore didn't already exist.

ca.crt and pem file

The ca.crt file contains the root certificate and the intermediate certificate 
that was used to sign the CSR. The pem file contains the signed CSR returned to 
us, the intermediate cert, and the root CA (in that order).

openssl verify ca.crt and pem
openssl verify -CAfile ca.crt us-west-2.pem
signed_certs/us-west-2.pem: OK

Command output after enabling encryption

nodetool status (output truncated)
Datacenter: us-east                                                             
                                   
===================                                      
Status=Up/Down                                           
|/ State=Normal/Leaving/Joining/Moving                   
--  Address         Load       Tokens       Owns (effective)  Host ID           
                    Rack
?N  52.44.11.221    ?          256          25.4%             null              
                    1c             
...
?N  52.204.232.195  ?          256          23.2%             null              
                    1d             
Datacenter: us-west-2                                                           
                                   
=====================
Status=Up/Down                                           
|/ State=Normal/Leaving/Joining/Moving                   
--  Address         Load       Tokens       Owns (effective)  Host ID           
                    Rack           
?N  34.209.2.144    ?          256          26.5%             null              
                    2c             
UN  52.40.32.177    105.99 GiB  256          23.7%             null             
                     2c            
?N  34.210.109.203  ?          256          24.7%             null              
                    2a   
...                  

With the online node being the node with encryption set.

cqlsh to localhost
cassy-node6:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, 
"Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

cqlsh to remote node Remote node is a node with encryption enabled
cassy-node6:~$ cqlsh 10.0.2.7
Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111, 
"Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})


  

Reply via email to