[jira] [Commented] (CASSANDRA-20291) Batch queries timeout when private IP of a node fails in multi-dc Cassandra setup

Tommy Stendahl (Jira) Mon, 09 Jun 2025 01:28:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956837#comment-17956837
 ]


Tommy Stendahl commented on CASSANDRA-20291:
--------------------------------------------

I did a review of [~manmagic3] [pull 
request|https://github.com/apache/cassandra/pull/4155], LGTM.

> Batch queries timeout when private IP of a node fails in multi-dc Cassandra 
> setup
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20291
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20291
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Consistency/Batch Log
>            Reporter: Manish Khandelwal
>            Assignee: Manish Khandelwal
>            Priority: Normal
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> I have a Cassandra 4.1.4 cluster with two data centers, once cluster with 3 
> nodes and other cluster with 2 nodes. The configuration is: listen_address = 
> private IP, broadcast_address = public IP,  listen_on_broadcast_address = 
> true, prefer_local = true
> {code:java}
>  Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load      Tokens  Owns  Host ID                            
>    Rack
> UN  X.Y.Z.Node1  2.4 GiB   256     ?     766ae19b-3c87-47f1-8b85-23085b69b202 
>  RAC1
> UN  X.Y.Z.Node3   2.51 GiB  256     ?     
> 2cbdcb3d-5344-46f3-9b9e-283888d92a4b  RAC1
> UN  X.Y.Z.Node2  2.47 GiB  256     ?     30756549-255d-4ec2-b15b-071c3983edc0 
>  RAC1
>  
> Datacenter: DC2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load      Tokens  Owns  Host ID                            
>    Rack
> UN  X.Y.Z.Node4   2.5 GiB   256     ?     
> 38c84625-fbd0-403f-9862-48d3e8d30400  RAC1
> UN  X.Y.Z.Node5   2.54 GiB  256     ?     
> 9af001f9-4226-4958-9cc9-efdc52ca9e1d  RAC1{code}
> *Issue Observed:*
>  * We execute a multi-partition batch query with LOCAL_QUORUM consistency, 
> and it succeeds.
>  * We bring down the private IP of one node in the DC where queries are 
> executed.
>  * Batch queries start timing out, but simple INSERT and SELECT queries work 
> fine.
>  * Stopping /Restarting the affected node (node where private IP was bought 
> down) resolves the issue, and batch queries succeed again.
> *Analysis So far*
> The node is not marked down in nodetool status but is unreachable in nodetool 
> describecluster. From the Cassandra source code, it looks like the 
> coordinator picks two other nodes for writing batch logs. The failed node 
> (private IP down) gets selected for batch logs, but it never responds, 
> causing the timeout. This is evident from the TRACE logs as well
> For example in the above topology, for DC1, we bought down the private 
> interface of node with IP X.Y.Z.Node1, and fired a query on node with IP 
> X.Y.Z.Node3. following were the logs at the coordinator that is X.Y.Z.Node3
> {code:java}
>  TRACE [Native-Transport-Requests-1] 2025-02-05 06:59:33,534 
> StorageProxy.java:1280 - Sending batchlog store request 
> 5f4624d0-e386-11ef-9bab-e9a6cd1b4f50 to 
> Full(/X.Y.Z.Node2:9700,(-9223372036854775808,-9223372036854775808]) for 3 
> mutations
> TRACE [Native-Transport-Requests-1] 2025-02-05 06:59:33,534 
> StorageProxy.java:1280 - Sending batchlog store request 
> 5f4624d0-e386-11ef-9bab-e9a6cd1b4f50 to 
> Full(/X.Y.Z.Node1:9700,(-9223372036854775808,-9223372036854775808]) for 3 
> mutations
> WARN  [Messaging-EventLoop-3-11] 2025-02-05 06:59:33,608 
> NoSpamLogger.java:108 - 
> host3/X.Y.Z.Node3:9700>host1/X.Y.Z.Node1:9700-SMALL_MESSAGES[no-channel] 
> dropping message of type BATCH_STORE_REQ whose timeout expired before 
> reaching the network{code}
> In the client logs we get 
> {code:java}
>  Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: 
> Cassandra timeout during BATCH_LOG write query at consistency TWO (2 replica 
> were required but only 1 acknowledged the write){code}
>   If we restart or stop the node (X.Y.Z.Node1) where private interface was 
> down, Nodetool status of other nodes is updated and they register it as down. 
> Now if we fire the batch query, only one node gets selected for batchlog 
> which is available and hint is saved for node with downed private interface. 
> Evident from the logs below     
> {code:java}
>  TRACE [Native-Transport-Requests-1] 2025-02-05 07:38:23,863 
> StorageProxy.java:1280 - Sending batchlog store request 
> cc424870-e38b-11ef-9bab-e9a6cd1b4f50 to 
> Full(/X.Y.Z.Node2:9700,(-9223372036854775808,-9223372036854775808]) for 3 
> mutations
> TRACE [MutationStage-2] 2025-02-05 07:38:23,867 StorageProxy.java:2730 - 
> Adding hints for [/X.Y.Z.Node1:9700]{code}
>         
> I think the issue is that in multi DC cluster with both public and private 
> interface enabled,  if private interface of node goes down and batch queries 
> are fired. Failure detector mechanism does not update the status of the node 
> as unreachable and that node appears live. During batch query execution when 
> replicas are filtered for batchlog, the node with failed private interface 
> gets selected but request is never sent to it. This request results in 
> WriteTimeoutExeption with writetype "BATCH_LOG"
>  
>             



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-20291) Batch queries timeout when private IP of a node fails in multi-dc Cassandra setup

Reply via email to