[ 
https://issues.apache.org/jira/browse/IGNITE-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-25146:
----------------------------------
    Description: 
The root cause is the optimization preventing the double storage update. We 
keep the primary replica as the most up-to-date one, and its storage is updated 
before the changes are replicated to raft. In raft state machine those changes 
are not applied if this condition itn’t met:
{{}}
{code:java}
if (cmd.full() || 
!localNodeId.equals(storageLeaseInfo.primaryReplicaNodeId())){code}
{{ }}
In the case of non-full transaction, we write to the storage only if the local 
node is not the same as the previously chosen primary replica (storageLeaseInfo 
is the value saved to the storage). So, *the actual scenario* was the following:
 * node0 is saved as the primary replica leaseholder for partition N;

 * value X is written to the storage of partition N on node0 before replication 
(because node0 is primary);

 * value X is replicated and is not written to the storage once more;

 * filter is changed, partition N is moved to node1, evicted from node0;

 * filter is changed again and partition N is moved back;

 * during the rebalance, we apply raft log (no snapshot), so the commands are 
applied: PrimaryReplicaChangeCommand (writing node0 as the primary to the 
storage), UpdateAllCommand, WriteIntentSwitchCommand;

 * on UpdateAllCommand, 
{{localNodeId.equals(storageLeaseInfo.primaryReplicaNodeId()}} is true and no 
changes are written to the storage.

> Data loss after rebalance
> -------------------------
>
>                 Key: IGNITE-25146
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25146
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Assignee: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> The root cause is the optimization preventing the double storage update. We 
> keep the primary replica as the most up-to-date one, and its storage is 
> updated before the changes are replicated to raft. In raft state machine 
> those changes are not applied if this condition itn’t met:
> {{}}
> {code:java}
> if (cmd.full() || 
> !localNodeId.equals(storageLeaseInfo.primaryReplicaNodeId())){code}
> {{ }}
> In the case of non-full transaction, we write to the storage only if the 
> local node is not the same as the previously chosen primary replica 
> (storageLeaseInfo is the value saved to the storage). So, *the actual 
> scenario* was the following:
>  * node0 is saved as the primary replica leaseholder for partition N;
>  * value X is written to the storage of partition N on node0 before 
> replication (because node0 is primary);
>  * value X is replicated and is not written to the storage once more;
>  * filter is changed, partition N is moved to node1, evicted from node0;
>  * filter is changed again and partition N is moved back;
>  * during the rebalance, we apply raft log (no snapshot), so the commands are 
> applied: PrimaryReplicaChangeCommand (writing node0 as the primary to the 
> storage), UpdateAllCommand, WriteIntentSwitchCommand;
>  * on UpdateAllCommand, 
> {{localNodeId.equals(storageLeaseInfo.primaryReplicaNodeId()}} is true and no 
> changes are written to the storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to