[ 
https://issues.apache.org/jira/browse/IGNITE-27288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Scherbakov updated IGNITE-27288:
---------------------------------------
    Description: 
I've observed this assertion in a scenario, when a partition leader gets 
temporary segmented from remaining nodes:
A,B,C - raft nodes
B - is leader
B is segmented
A and C choose new leader A
B returns to topology as a stale leader and steps down
B and C choose new leader C, distrupting leadership of A
{noformat}
2025-09-05 21:24:35:393 +0000 
[ERROR][%poc-tester-SERVER-192.168.211.108-id-0%JRaft-FSMCaller-Disruptor_stripe_6-0][StateMachineAdapter]
 Encountered an error=Status[ESTATEMACHINE<10002>: StateMachine meet critical 
error when applying one or more tasks since index=1142, 
Status[ESTATEMACHINE<10002>: Reordering detected: [old=HybridTimestamp 
[physical=2025-09-05 21:24:34:122 +0000, logical=2, 
composite=115153795424059394], new=HybridTimestamp [physical=2025-09-05 
21:24:34:085 +0000, logical=17, composite=115153795421634577]]]] on 
StateMachine 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine,
 it's highly recommended to implement this method as raft stops working since 
some error occurs, you should figure out the cause and repair or remove this 
node.
Error [type=ERROR_TYPE_STATE_MACHINE, status=Status[ESTATEMACHINE<10002>: 
StateMachine meet critical error when applying one or more tasks since 
index=1142, Status[ESTATEMACHINE<10002>: Reordering detected: 
[old=HybridTimestamp [physical=2025-09-05 21:24:34:122 +0000, logical=2, 
composite=115153795424059394], new=HybridTimestamp [physical=2025-09-05 
21:24:34:085 +0000, logical=17, composite=115153795421634577]]]]]
        at 
org.apache.ignite.raft.jraft.core.IteratorImpl.getOrCreateError(IteratorImpl.java:168)
        at 
org.apache.ignite.raft.jraft.core.IteratorImpl.setErrorAndRollback(IteratorImpl.java:159)
        at 
org.apache.ignite.raft.jraft.core.IteratorWrapper.setErrorAndRollback(IteratorWrapper.java:74)
        at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:921)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:570)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:536)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:454)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:123)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:117)
        at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:322)
        at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:279)
        at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
        at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}
Need to get to the bottom of this issue.
Most likely it's related to RAFT leader lease invariant broken (no two leaders 
exists in intersecting raft leader leases)
See [1] for details
[1] https://github.com/apache/ignite-3/pull/4821/files#r1884194321

  was:
I've observed this assertion in a scenario, when a partition leader gets 
temporary segmented from remaining nodes:
A,B,C - raft nodes
B - is leader
B is segmented
A and C choose new leader A
B returns to topology as a stale leader and steps down
B and C choose new leader C, distrupting leadership of A
{noformat}
2025-09-05 21:24:35:393 +0000 
[ERROR][%poc-tester-SERVER-192.168.211.108-id-0%JRaft-FSMCaller-Disruptor_stripe_6-0][StateMachineAdapter]
 Encountered an error=Status[ESTATEMACHINE<10002>: StateMachine meet critical 
error when applying one or more tasks since index=1142, 
Status[ESTATEMACHINE<10002>: Reordering detected: [old=HybridTimestamp 
[physical=2025-09-05 21:24:34:122 +0000, logical=2, 
composite=115153795424059394], new=HybridTimestamp [physical=2025-09-05 
21:24:34:085 +0000, logical=17, composite=115153795421634577]]]] on 
StateMachine 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine,
 it's highly recommended to implement this method as raft stops working since 
some error occurs, you should figure out the cause and repair or remove this 
node.
Error [type=ERROR_TYPE_STATE_MACHINE, status=Status[ESTATEMACHINE<10002>: 
StateMachine meet critical error when applying one or more tasks since 
index=1142, Status[ESTATEMACHINE<10002>: Reordering detected: 
[old=HybridTimestamp [physical=2025-09-05 21:24:34:122 +0000, logical=2, 
composite=115153795424059394], new=HybridTimestamp [physical=2025-09-05 
21:24:34:085 +0000, logical=17, composite=115153795421634577]]]]]
        at 
org.apache.ignite.raft.jraft.core.IteratorImpl.getOrCreateError(IteratorImpl.java:168)
        at 
org.apache.ignite.raft.jraft.core.IteratorImpl.setErrorAndRollback(IteratorImpl.java:159)
        at 
org.apache.ignite.raft.jraft.core.IteratorWrapper.setErrorAndRollback(IteratorWrapper.java:74)
        at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:921)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:570)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:536)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:454)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:123)
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:117)
        at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:322)
        at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:279)
        at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
        at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}
Need to get to the bottom of this issue.
Most likely it's related to RAFT leader lease invariant broken (no two leaders 
exists in intersecting raft leader leases)


> Fix java.lang.AssertionError: Reordering detected on unstable raft topology
> ---------------------------------------------------------------------------
>
>                 Key: IGNITE-27288
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27288
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexey Scherbakov
>            Assignee: Alexey Scherbakov
>            Priority: Major
>              Labels: ignite-3
>
> I've observed this assertion in a scenario, when a partition leader gets 
> temporary segmented from remaining nodes:
> A,B,C - raft nodes
> B - is leader
> B is segmented
> A and C choose new leader A
> B returns to topology as a stale leader and steps down
> B and C choose new leader C, distrupting leadership of A
> {noformat}
> 2025-09-05 21:24:35:393 +0000 
> [ERROR][%poc-tester-SERVER-192.168.211.108-id-0%JRaft-FSMCaller-Disruptor_stripe_6-0][StateMachineAdapter]
>  Encountered an error=Status[ESTATEMACHINE<10002>: StateMachine meet critical 
> error when applying one or more tasks since index=1142, 
> Status[ESTATEMACHINE<10002>: Reordering detected: [old=HybridTimestamp 
> [physical=2025-09-05 21:24:34:122 +0000, logical=2, 
> composite=115153795424059394], new=HybridTimestamp [physical=2025-09-05 
> 21:24:34:085 +0000, logical=17, composite=115153795421634577]]]] on 
> StateMachine 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine,
>  it's highly recommended to implement this method as raft stops working since 
> some error occurs, you should figure out the cause and repair or remove this 
> node.
> Error [type=ERROR_TYPE_STATE_MACHINE, status=Status[ESTATEMACHINE<10002>: 
> StateMachine meet critical error when applying one or more tasks since 
> index=1142, Status[ESTATEMACHINE<10002>: Reordering detected: 
> [old=HybridTimestamp [physical=2025-09-05 21:24:34:122 +0000, logical=2, 
> composite=115153795424059394], new=HybridTimestamp [physical=2025-09-05 
> 21:24:34:085 +0000, logical=17, composite=115153795421634577]]]]]
>         at 
> org.apache.ignite.raft.jraft.core.IteratorImpl.getOrCreateError(IteratorImpl.java:168)
>         at 
> org.apache.ignite.raft.jraft.core.IteratorImpl.setErrorAndRollback(IteratorImpl.java:159)
>         at 
> org.apache.ignite.raft.jraft.core.IteratorWrapper.setErrorAndRollback(IteratorWrapper.java:74)
>         at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:921)
>         at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:570)
>         at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:536)
>         at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:454)
>         at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:123)
>         at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:117)
>         at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:322)
>         at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:279)
>         at 
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
>         at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> Need to get to the bottom of this issue.
> Most likely it's related to RAFT leader lease invariant broken (no two 
> leaders exists in intersecting raft leader leases)
> See [1] for details
> [1] https://github.com/apache/ignite-3/pull/4821/files#r1884194321



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to