I have a small 3 nodes C* + Spark cluster, when I run any query on spark it
gives me connection refused error on 2 C* nodes. which puts all the
pressure on single node resulting in bad performance. below is the error
from spark-submit
17/07/25 12:00:22 INFO Cluster: New Cassandra host /10.128.1.1:9042 added
17/07/25 12:00:22 INFO Cluster: New Cassandra host /10.128.1.2:9042 added
17/07/25 12:00:22 INFO Cluster: New Cassandra host /10.128.1.3:9042 added
17/07/25 12:00:22 INFO CassandraConnector: Connected to Cassandra
cluster: Test Cluster
17/07/25 12:00:22 WARN Session: Error creating pool to /10.128.1.3:9042
com.datastax.driver.core.exceptions.ConnectionException:
[/10.128.1.3:9042] Pool was closed during initialization
Initially I thought it might be CassandraConnector issue but on further
investigating I found out that all the nodes are listening on different
interfaces/IPs although deployment was done using Ansible (playbook
<https://github.com/mrlesmithjr/ansible-cassandra>) so configuration on all
the nodes are same except for ip addresses of each node.
i.e node 1 is listening on 127.0.0.1:9042 (but not its interface IP), node
2 is listening on private ip 10.128.1.2:9042 (but not localhost), node 3 is
not listening on 9042 at all.
Even more I am unable to cqlsh on node 1 & 3 from other nodes. Have tried
restarting the nodes but result is the same. ran cassandra using sudo
cassandra -R to look for any error or warning as well.
Can anyone please point me to further debug this issue and how to solve it.
If you need any thing else or further information about the setup please
ask.
Cassandra version: 3.9
Installation: Debian package
Node: GCE, 8 cpu, 30Gb memory
Cassandra conf file is also attached for ref.
Cassandra conf
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.128.1.1
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: 10.128.1.1
rpc_port: 9160
output of netstat and nodetoolstatus on each node
Node 1
junaid@cassandra-spark-c1-i1:~$ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 2166/sshd*tcp 0 0 10.128.1.1:7000
<http://10.128.1.1:7000/> 0.0.0.0:* LISTEN
27116/java*
tcp 0 0 127.0.0.1:8126 0.0.0.0:*
LISTEN 11010/trace-agent
tcp 0 0 0.0.0.0:7199 0.0.0.0:*
LISTEN 27116/java
tcp 0 0 127.0.0.1:17123 0.0.0.0:*
LISTEN 11011/python*tcp 0 0 127.0.0.1:9160
<http://127.0.0.1:9160/> 0.0.0.0:* LISTEN
27116/java*
tcp 0 0 127.0.0.1:33773 0.0.0.0:*
LISTEN 27116/java*tcp 0 0 127.0.0.1:9042
<http://127.0.0.1:9042/> 0.0.0.0:* LISTEN
27116/java*
tcp6 0 0 :::22 :::*
LISTEN 2166/sshd
tcp6 0 0 10.128.1.1:45539 :::*
LISTEN 2029/java
tcp6 0 0 10.128.1.1:7077 :::*
LISTEN 2028/java
tcp6 0 0 :::8080 :::*
LISTEN 2028/java
tcp6 0 0 :::8081 :::*
LISTEN 2029/java
tcp6 0 0 10.128.1.1:6066 :::*
LISTEN 2028/java
junaid@cassandra-spark-c1-i1:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 10.128.1.1 27.36 GiB 256 67.8%
63ff8054-934a-4a7a-a33f-405e064bc8e8 rack1
UN 10.128.1.2 26.02 GiB 256 70.9%
702e8a31-6441-4444-b569-d2d137d54a5d rack1
DN 10.128.1.3 23.35 GiB 256 61.3%
b5b22a90-f037-433a-8ad9-f370b26cca26 rack1
Node 2
junaid@cassandra-spark-c1-i2:~$ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 127.0.0.1:36277 0.0.0.0:*
LISTEN 2879/java
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 2059/sshd*tcp 0 0 10.128.1.2:7000
<http://10.128.1.2:7000/> 0.0.0.0:* LISTEN
2879/java*
tcp 0 0 127.0.0.1:8126 0.0.0.0:*
LISTEN 2015/trace-agent
tcp 0 0 0.0.0.0:7199 0.0.0.0:*
LISTEN 2879/java
tcp 0 0 127.0.0.1:17123 0.0.0.0:*
LISTEN 2016/python*tcp 0 0 10.128.1.2:9042
<http://10.128.1.2:9042/> 0.0.0.0:* LISTEN
2879/java*
tcp6 0 0 :::22 :::*
LISTEN 2059/sshd
tcp6 0 0 10.128.1.2:37271 :::*
LISTEN 1648/java
tcp6 0 0 :::8081 :::*
LISTEN 1648/java
junaid@cassandra-spark-c1-i2:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 10.128.1.1 27.36 GiB 256 67.8%
63ff8054-934a-4a7a-a33f-405e064bc8e8 rack1
UN 10.128.1.2 26.02 GiB 256 70.9%
702e8a31-6441-4444-b569-d2d137d54a5d rack1
DN 10.128.1.3 23.35 GiB 256 61.3%
b5b22a90-f037-433a-8ad9-f370b26cca26 rack1
Node 3
junaid@cassandra-spark-c1-i3:~$ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 127.0.0.1:43573 0.0.0.0:*
LISTEN 8124/java
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 1976/sshd*tcp 0 0 10.128.1.3:7000
<http://10.128.1.3:7000/> 0.0.0.0:* LISTEN
8124/java*
tcp 0 0 127.0.0.1:8126 0.0.0.0:*
LISTEN 1834/trace-agent
tcp 0 0 0.0.0.0:7199 0.0.0.0:*
LISTEN 8124/java
tcp 0 0 127.0.0.1:17123 0.0.0.0:*
LISTEN 1835/python
tcp6 0 0 :::22 :::*
LISTEN 1976/sshd
tcp6 0 0 10.128.1.3:40967 :::*
LISTEN 1785/java
tcp6 0 0 :::8081 :::*
LISTEN 1785/java
junaid@cassandra-spark-c1-i3:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 10.128.1.1 27.36 GiB 256 67.8%
63ff8054-934a-4a7a-a33f-405e064bc8e8 rack1
UN 10.128.1.2 26.02 GiB 256 70.9%
702e8a31-6441-4444-b569-d2d137d54a5d rack1
UN 10.128.1.3 23.35 GiB 256 61.3%
b5b22a90-f037-433a-8ad9-f370b26cca26 rack1
Regards,
Junaid
cluster_name: 'Test Cluster'
auto_bootstrap: false
num_tokens: 256
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_flush_period_in_ms: 10000
hints_directory: /var/lib/cassandra/hints
max_hints_file_size_in_mb: 128
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
role_manager: CassandraRoleManager
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000
credentials_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /media/db
commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
commit_failure_policy: stop
prepared_statements_cache_size_mb:
thrift_prepared_statements_cache_size_mb:
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "10.128.1.1"
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
memtable_allocation_type: heap_buffers
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.128.1.1
#listen_interface: ens4
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: 10.128.1.1
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
column_index_cache_size_in_kb: 2
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
# require_client_auth: false
# require_endpoint_verification: false
# enable or disable client/server encryption.
client_encryption_options:
enabled: false
# If enabled and optional is set to true encrypted and unencrypted
connections are handled.
optional: false
keystore: conf/.keystore
keystore_password: cassandra
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
# truststore: conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
# internode_compression controls whether traffic between nodes is
# compressed.
# can be: all - all traffic is compressed
# dc - traffic between different datacenters is compressed
# none - nothing is compressed.
internode_compression: dc
# Enable or disable tcp_nodelay for inter-dc communication.
# Disabling it will result in larger (but fewer) network packets being sent,
# reducing overhead from the TCP protocol itself, at the cost of increasing
# latency if you block for cross-datacenter responses.
inter_dc_tcp_nodelay: false
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
enable_user_defined_functions: false
enable_scripted_user_defined_functions: false
windows_timer_interval: 1
transparent_data_encryption_options:
enabled: false
chunk_length_kb: 64
cipher: AES/CBC/PKCS5Padding
key_alias: testing:1
# CBC IV length for AES needs to be 16 bytes (which is also the default
size)
# iv_length: 16
key_provider:
- class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:
- keystore: conf/.keystore
keystore_password: cassandra
store_type: JCEKS
key_password: cassandra
#####################
# SAFETY THRESHOLDS #
#####################
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]