[jira] [Commented] (CASSANDRA-20205) Failed lightweight transaction leaves Paxos in apparently unresolvable state

Peter Machon (Jira) Wed, 15 Jan 2025 14:23:12 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913471#comment-17913471
 ]


Peter Machon commented on CASSANDRA-20205:
------------------------------------------

I was still editing the message, sorry.

Yes, exactly it is expected. However, it does not happen as expected.

Given that the database only behaves like this after a lightweight transaction 
failed due to not reaching consensus by the quorum, the entries in the 
system.paxos table and the mentioned logs, I can only assume that this behavior 
is related to the apparently stalled Paxos state. 

> Failed lightweight transaction leaves Paxos in apparently unresolvable state
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20205
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20205
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Peter Machon
>            Priority: Normal
>         Attachments: paxos_1.csv, paxos_2.csv, paxos_3.csv
>
>
> In three node Cassandra cluster I am consistently facing the same kind of 
> fatal situation on tables that are solely written using Cassandra's 
> lightweight transactions (CAS).
> Whenever a lightweight transaction fails to reach quorum (1/2), e.g. due to 
> high load, any following attempt to write data within a transactions fails, 
> i.e. does not return {{{}"[applied]"=true{}}}.
> Using {{{}select * from system.paxos where cf_id=<id of table>{}}}, I see 
> that there are entries, which I assume to be pending transactions.
> Further, in {{/var/log/Cassandra/system.log}} I see logs like:
> {quote}INFO [ScheduledTasks:1] 2025-01-12 21:46:53,005 
> UncommittedTableData.java:567 - Scheduling uncommitted paxos data merge task 
> for {{<any other table>}}
> {quote}
> {quote}INFO [OptionalTasks:1] 2025-01-12 21:46:53,006 
> PaxosCleanupLocalCoordinator.java:89 - Completing uncommitted paxos instances 
> for {{<table in stalled state>}} on ranges
> {quote}
> However, I can't figure how to resolve the state {{nodetool repair -full 
> <keyspace>}} (and variations), as well as restarting all nodes did not 
> resolve the issue.
> _Further information:_
>  * Cassandra version: 4.1.5
>  * OS: Ubuntu 22.04
>  * replication strategy: SimpleStrategy
>  * replication factor: 3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-20205) Failed lightweight transaction leaves Paxos in apparently unresolvable state

Reply via email to