[jira] [Commented] (CASSANDRA-20205) Failed lightweight transaction leaves Paxos in apparently unresolvable state

Peter Machon (Jira) Fri, 17 Jan 2025 23:19:46 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914299#comment-17914299
 ]


Peter Machon commented on CASSANDRA-20205:
------------------------------------------

No need to be sorry, I appreciate your help.

For now, we however decided to dump and reimport the entire table, since it is 
the production environment. I will try to make so time to reproduce the effect 
in a test environment and circle back with more information if that worked. 

Thanks anyway for the Paxos version hint, I didn't know there was a choice.

> Failed lightweight transaction leaves Paxos in apparently unresolvable state
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20205
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20205
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Peter Machon
>            Priority: Normal
>         Attachments: paxos_1.csv, paxos_2.csv, paxos_3.csv
>
>
> In three node Cassandra cluster I am consistently facing the same kind of 
> fatal situation on tables that are solely written using Cassandra's 
> lightweight transactions (CAS).
> Whenever a lightweight transaction fails to reach quorum (1/2), e.g. due to 
> high load, any following attempt to write data within a transactions fails, 
> i.e. does not return {{{}"[applied]"=true{}}}.
> Using {{{}select * from system.paxos where cf_id=<id of table>{}}}, I see 
> that there are entries, which I assume to be pending transactions.
> Further, in {{/var/log/Cassandra/system.log}} I see logs like:
> {quote}INFO [ScheduledTasks:1] 2025-01-12 21:46:53,005 
> UncommittedTableData.java:567 - Scheduling uncommitted paxos data merge task 
> for {{<any other table>}}
> {quote}
> {quote}INFO [OptionalTasks:1] 2025-01-12 21:46:53,006 
> PaxosCleanupLocalCoordinator.java:89 - Completing uncommitted paxos instances 
> for {{<table in stalled state>}} on ranges
> {quote}
> However, I can't figure how to resolve the state {{nodetool repair -full 
> <keyspace>}} (and variations), as well as restarting all nodes did not 
> resolve the issue.
> _Further information:_
>  * Cassandra version: 4.1.5
>  * OS: Ubuntu 22.04
>  * replication strategy: SimpleStrategy
>  * replication factor: 3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-20205) Failed lightweight transaction leaves Paxos in apparently unresolvable state

Reply via email to