[
https://issues.apache.org/jira/browse/CASSJAVA-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18049861#comment-18049861
]
Lukasz Antoniak commented on CASSJAVA-106:
------------------------------------------
Can you check if {{resolve-contact-points = false}} fixes the issue for you?
See documentation
[here|https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/resources/reference.conf#L1088].
> Gauge counters for open-connections not updated after Cassandra pod
> recreation in geographical redundant setup
> --------------------------------------------------------------------------------------------------------------
>
> Key: CASSJAVA-106
> URL: https://issues.apache.org/jira/browse/CASSJAVA-106
> Project: Apache Cassandra Java driver
> Issue Type: Bug
> Reporter: Ioannis Stoltidis
> Priority: Normal
>
> We are running a containerized version of Cassandra in a geographical
> redundant setup with 2 datacenters. Each datacenter contains three Cassandra
> pods, which are managed as part of a Cassandra StatefulSet. Every pod has an
> associated Kubernetes service with a load balancer IP address. This IP
> remains constant and serves as the hostname for internode communication among
> all Cassandra pods. Additionally, each datacenter includes a pod running our
> application, which uses the Cassandra driver to communicate with the pool of
> Cassandra pods. We utilize the DataStax Java driver configured as follows:
> * Two contact points are specified, connecting to two hosts (the first 2
> pods, named cassandra-datacenter1_rack1-0 and cassandra-datacenter1_rack1-1).
> * After all the endpoints are discovered, one connection per server in the
> local DC is established, along with one control connection.
> The mapping between host domains and IP addresses is as follows:
> ||domain||IP||
> |cassandra-datacenter1_rack1-0|214.22.161.195|
> |cassandra-datacenter1_rack1-1|214.22.161.196|
> |cassandra-datacenter1_rack1-2|214.22.161.197|
> While monitoring Cassandra connections using gauge counters exposed via the
> Dropwizard exporter, we observed that some counters show domain names while
> others display IP addresses, and at least one counter appears duplicated.
> The following 4 gauge counters are being observed:
> {noformat}
> s0.nodes.214_22_161_196:9042.pool.open-connections → initial value: 1
> s0.nodes.214_22_161_197:9042.pool.open-connections → initial value: 2
> s0.nodes.cassandra-datacenter1-rack1-0_cassandra-datacenter1-rack1:9042.pool.open-connections
> → initial value: 1
> s0.nodes.cassandra-datacenter1-rack1-1_cassandra-datacenter1-rack1:9042.pool.open-connections
> → initial value: 0{noformat}
> After testing the following recovery procedure on 2 of the 3 pods in the
> local datacenter:
> * Halt Cassandra container using: echo STOPPED >
> /var/lib/cassandra/.cassandra.init && pkill java
> * Remove Persistent Volume Claim (PVC) associated with the two pods
> * Run nodetool removenode on the cluster to clean up the old instances
> * Restart the two pods and re-enable Cassandra using: echo RUNNING >
> /var/lib/cassandra/.cassandra.init
> We observed that the gauge counters are no longer accurately updated.
> Specifically, they change to:
> {noformat}
> s0.nodes.214_22_161_196:9042.pool.open-connections → 0
> s0.nodes.214_22_161_197:9042.pool.open-connections → 2
> s0.nodes.cassandra-datacenter1-rack1-0_cassandra-datacenter1-rack1:9042.pool.open-connections
> → 0
> s0.nodes.cassandra-datacenter1-rack1-1_cassandra-datacenter1-rack1:9042.pool.open-connections
> → 0{noformat}
> No other counters are created. These values remain stuck and do not reflect
> the actual state of the connection pool, because from server side we can
> verify that all expected connections are up again (i.e. one connection per
> server + 1 control). These values are only correctly reset when we manually
> restart the application pod that utilizes the DataStax Java driver, which in
> turn recreates the session.
> *Expected behavior:*
> Gauge counters should reflect the actual number of open connections even
> after the Cassandra pods are deleted and recreated.
> *Observed behavior:*
> After pod recreation and node replacement, the counters stay at incorrect
> values until the client session is forcibly reset by restarting the
> application.
> *Environment:*
> Cassandra: containerized
> Java driver: DataStax Java driver (version 4.19.0)
> Monitoring via: simpleclient_dropwizard of io.prometheus
> Setup: Geo-redundant, 2 datacenters, 3 pods per datacenter
> *Impact:*
> This behavior results in stale monitoring data and obscures actual cluster
> health and connectivity, particularly in automated or production setups.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]