[jira] [Comment Edited] (CASSANDRA-20514) Paxos mixed mode infinite loop with ttl'd state

Michael Semb Wever (Jira) Fri, 04 Apr 2025 07:28:06 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940929#comment-17940929
 ]


Michael Semb Wever edited comment on CASSANDRA-20514 at 4/4/25 2:25 PM:
------------------------------------------------------------------------

this broke CI (`ant check` fails, see the lint stage in the pipeline).

5.0: 
https://ci-cassandra.apache.org/job/Cassandra-5.0/430/cloudbees-pipeline-explorer/?filter=1154
 

4.1: 
https://ci-cassandra.apache.org/view/Cassandra%204.1/job/Cassandra-4.1-dtest/lastFailedBuild/jdk=jdk_1.8_latest,label=cassandra-dtest,split=1/console


was (Author: michaelsembwever):
this broke CI (`ant check` fails, see the lint stage in the pipeline).
https://ci-cassandra.apache.org/job/Cassandra-5.0/430/cloudbees-pipeline-explorer/?filter=1154
 

> Paxos mixed mode infinite loop with ttl'd state
> -----------------------------------------------
>
>                 Key: CASSANDRA-20514
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20514
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/Lightweight Transactions
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.1.x, 5.0.x, 5.x
>
>
> This is similar to the bug fixed in CASSANDRA-20493.
> CEP-14 changed the ttl behavior of legacy paxos state to expire based off the 
> ballot time of the operation being persisted, not the time a commit is 
> persisted. This eliminated the race addressed by CASSANDRA-12043, and so the 
> check it added to the most recent commit prepare logic was removed.
> When operating in mixed mode though, this race can still be a problem. If a 
> 4.1 or higher node is coordinating a paxos operation with 2 or more replicas 
> on 4.0 or lower, this race becomes a problem again. You need 3 things to make 
> this an infinite loop
> 1. a 4.1 node coordinating a paxos operation with 2x 4.0 replicas
> 2. replica A) a 4.0 node returns a most recent commit for a ballot that's 
> could have been ttld
> 3. replica B) a 4.0 node has ttl'd that mrc AND converted the ttld cells into 
> tombstones
> The 4.1 coordinator receives the mrc from replica A, but since it no longer 
> disregards missing most recent commits past the ttl window, it sends the 
> "missing" commit to replica B. Since replica B now has a tombstone for that 
> mrc, and tombstones win when reconciled with live cells, even ones with ttls, 
> the commit is a noop and it continues to report nothing for its mrc value 
> when the coordinator restarts the prepare phase. This loops until the query 
> times out



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-20514) Paxos mixed mode infinite loop with ttl'd state

Reply via email to