I have a small 3 nodes C* + Spark cluster, when I run any query on spark it
gives me connection refused error on 2 C* nodes. which puts all the
pressure on single node resulting in bad performance. below is the error
from spark-submit

17/07/25 12:00:22 INFO Cluster: New Cassandra host /10.128.1.1:9042 added
17/07/25 12:00:22 INFO Cluster: New Cassandra host /10.128.1.2:9042 added
17/07/25 12:00:22 INFO Cluster: New Cassandra host /10.128.1.3:9042 added
17/07/25 12:00:22 INFO CassandraConnector: Connected to Cassandra
cluster: Test Cluster
17/07/25 12:00:22 WARN Session: Error creating pool to /10.128.1.3:9042
com.datastax.driver.core.exceptions.ConnectionException:
[/10.128.1.3:9042] Pool was closed during initialization

Initially I thought it might be CassandraConnector issue but on further
investigating I found out that all the nodes are listening on different
interfaces/IPs although deployment was done using Ansible (playbook
<https://github.com/mrlesmithjr/ansible-cassandra>) so configuration on all
the nodes are same except for ip addresses of each node.
i.e node 1 is listening on 127.0.0.1:9042 (but not its interface IP), node
2 is listening on private ip 10.128.1.2:9042 (but not localhost), node 3 is
not listening on 9042 at all.

Even more I am unable to cqlsh on node 1 & 3 from other nodes. Have tried
restarting the nodes but result is the same. ran cassandra using sudo
cassandra -R to look for any error or warning as well.

Can anyone please point me to further debug this issue and how to solve it.
If you need any thing else or further information about the setup please
ask.

Cassandra version: 3.9
Installation: Debian package
Node: GCE, 8 cpu, 30Gb memory
Cassandra conf file is also attached for ref.

Cassandra conf

storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.128.1.1
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: 10.128.1.1
rpc_port: 9160

output of netstat and nodetoolstatus on each node
Node 1

junaid@cassandra-spark-c1-i1:~$ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*
LISTEN      2166/sshd*tcp        0      0 10.128.1.1:7000
<http://10.128.1.1:7000/>         0.0.0.0:*               LISTEN
27116/java*
tcp        0      0 127.0.0.1:8126          0.0.0.0:*
LISTEN      11010/trace-agent
tcp        0      0 0.0.0.0:7199            0.0.0.0:*
LISTEN      27116/java
tcp        0      0 127.0.0.1:17123         0.0.0.0:*
LISTEN      11011/python*tcp        0      0 127.0.0.1:9160
<http://127.0.0.1:9160/>          0.0.0.0:*               LISTEN
27116/java*
tcp        0      0 127.0.0.1:33773         0.0.0.0:*
LISTEN      27116/java*tcp        0      0 127.0.0.1:9042
<http://127.0.0.1:9042/>          0.0.0.0:*               LISTEN
27116/java*
tcp6       0      0 :::22                   :::*
LISTEN      2166/sshd
tcp6       0      0 10.128.1.1:45539        :::*
LISTEN      2029/java
tcp6       0      0 10.128.1.1:7077         :::*
LISTEN      2028/java
tcp6       0      0 :::8080                 :::*
LISTEN      2028/java
tcp6       0      0 :::8081                 :::*
LISTEN      2029/java
tcp6       0      0 10.128.1.1:6066         :::*
LISTEN      2028/java
junaid@cassandra-spark-c1-i1:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID
                         Rack
UN  10.128.1.1  27.36 GiB  256          67.8%
63ff8054-934a-4a7a-a33f-405e064bc8e8  rack1
UN  10.128.1.2  26.02 GiB  256          70.9%
702e8a31-6441-4444-b569-d2d137d54a5d  rack1
DN  10.128.1.3  23.35 GiB  256          61.3%
b5b22a90-f037-433a-8ad9-f370b26cca26  rack1

Node 2

junaid@cassandra-spark-c1-i2:~$ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
tcp        0      0 127.0.0.1:36277         0.0.0.0:*
LISTEN      2879/java
tcp        0      0 0.0.0.0:22              0.0.0.0:*
LISTEN      2059/sshd*tcp        0      0 10.128.1.2:7000
<http://10.128.1.2:7000/>         0.0.0.0:*               LISTEN
2879/java*
tcp        0      0 127.0.0.1:8126          0.0.0.0:*
LISTEN      2015/trace-agent
tcp        0      0 0.0.0.0:7199            0.0.0.0:*
LISTEN      2879/java
tcp        0      0 127.0.0.1:17123         0.0.0.0:*
LISTEN      2016/python*tcp        0      0 10.128.1.2:9042
<http://10.128.1.2:9042/>         0.0.0.0:*               LISTEN
2879/java*
tcp6       0      0 :::22                   :::*
LISTEN      2059/sshd
tcp6       0      0 10.128.1.2:37271        :::*
LISTEN      1648/java
tcp6       0      0 :::8081                 :::*
LISTEN      1648/java
junaid@cassandra-spark-c1-i2:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID
                         Rack
UN  10.128.1.1  27.36 GiB  256          67.8%
63ff8054-934a-4a7a-a33f-405e064bc8e8  rack1
UN  10.128.1.2  26.02 GiB  256          70.9%
702e8a31-6441-4444-b569-d2d137d54a5d  rack1
DN  10.128.1.3  23.35 GiB  256          61.3%
b5b22a90-f037-433a-8ad9-f370b26cca26  rack1

Node 3

junaid@cassandra-spark-c1-i3:~$ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
tcp        0      0 127.0.0.1:43573         0.0.0.0:*
LISTEN      8124/java
tcp        0      0 0.0.0.0:22              0.0.0.0:*
LISTEN      1976/sshd*tcp        0      0 10.128.1.3:7000
<http://10.128.1.3:7000/>         0.0.0.0:*               LISTEN
8124/java*
tcp        0      0 127.0.0.1:8126          0.0.0.0:*
LISTEN      1834/trace-agent
tcp        0      0 0.0.0.0:7199            0.0.0.0:*
LISTEN      8124/java
tcp        0      0 127.0.0.1:17123         0.0.0.0:*
LISTEN      1835/python
tcp6       0      0 :::22                   :::*
LISTEN      1976/sshd
tcp6       0      0 10.128.1.3:40967        :::*
LISTEN      1785/java
tcp6       0      0 :::8081                 :::*
LISTEN      1785/java
junaid@cassandra-spark-c1-i3:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID
                         Rack
UN  10.128.1.1  27.36 GiB  256          67.8%
63ff8054-934a-4a7a-a33f-405e064bc8e8  rack1
UN  10.128.1.2  26.02 GiB  256          70.9%
702e8a31-6441-4444-b569-d2d137d54a5d  rack1
UN  10.128.1.3  23.35 GiB  256          61.3%
b5b22a90-f037-433a-8ad9-f370b26cca26  rack1


Regards,
Junaid
cluster_name: 'Test Cluster'
auto_bootstrap: false
num_tokens: 256
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_flush_period_in_ms: 10000
hints_directory: /var/lib/cassandra/hints
max_hints_file_size_in_mb: 128
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
role_manager: CassandraRoleManager
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000
credentials_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
    - /media/db
commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
commit_failure_policy: stop
prepared_statements_cache_size_mb:
thrift_prepared_statements_cache_size_mb:
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
    # Addresses of hosts that are deemed contact points.
    # Cassandra nodes use this list of hosts to find each other and learn
    # the topology of the ring.  You must change this if you are running
    # multiple nodes!
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          # seeds is actually a comma-delimited list of addresses.
          # Ex: "<ip1>,<ip2>,<ip3>"
          - seeds: "10.128.1.1"
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
memtable_allocation_type: heap_buffers
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.128.1.1
#listen_interface: ens4
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: 10.128.1.1
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
column_index_cache_size_in_kb: 2
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: cassandra
    truststore: conf/.truststore
    truststore_password: cassandra
    # More advanced defaults below:
    # protocol: TLS
    # algorithm: SunX509
    # store_type: JKS
    # cipher_suites: 
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
    # require_client_auth: false
    # require_endpoint_verification: false

# enable or disable client/server encryption.
client_encryption_options:
    enabled: false
    # If enabled and optional is set to true encrypted and unencrypted 
connections are handled.
    optional: false
    keystore: conf/.keystore
    keystore_password: cassandra
    # require_client_auth: false
    # Set trustore and truststore_password if require_client_auth is true
    # truststore: conf/.truststore
    # truststore_password: cassandra
    # More advanced defaults below:
    # protocol: TLS
    # algorithm: SunX509
    # store_type: JKS
    # cipher_suites: 
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]

# internode_compression controls whether traffic between nodes is
# compressed.
# can be:  all  - all traffic is compressed
#          dc   - traffic between different datacenters is compressed
#          none - nothing is compressed.
internode_compression: dc

# Enable or disable tcp_nodelay for inter-dc communication.
# Disabling it will result in larger (but fewer) network packets being sent,
# reducing overhead from the TCP protocol itself, at the cost of increasing
# latency if you block for cross-datacenter responses.
inter_dc_tcp_nodelay: false

tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
enable_user_defined_functions: false
enable_scripted_user_defined_functions: false
windows_timer_interval: 1
transparent_data_encryption_options:
    enabled: false
    chunk_length_kb: 64
    cipher: AES/CBC/PKCS5Padding
    key_alias: testing:1
    # CBC IV length for AES needs to be 16 bytes (which is also the default 
size)
    # iv_length: 16
    key_provider:
      - class_name: org.apache.cassandra.security.JKSKeyProvider
        parameters:
          - keystore: conf/.keystore
            keystore_password: cassandra
            store_type: JCEKS
            key_password: cassandra


#####################
# SAFETY THRESHOLDS #
#####################

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to