[ 
https://issues.apache.org/jira/browse/IGNITE-24811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtsev updated IGNITE-24811:
-----------------------------------------
    Epic Link: IGNITE-22115

> Handle the case when a table storage is empty on Zone Raft group recovery
> -------------------------------------------------------------------------
>
>                 Key: IGNITE-24811
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24811
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Aleksandr Polovtsev
>            Assignee: Aleksandr Polovtsev
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Inside the colocation track, the following problem exists: 
> When a new table processor is added to the ZonePartitionRaftListener its 
> storage gets initialized with some information, like the last applied index 
> and Raft group configuration. However, a node can die or be restarted before 
> this information gets flushed onto a persistent storage which means that upon 
> the consecutive startup, this storage will return 0 as its last applied 
> index. Since on startup we use the minimum last applied index across all 
> storages during Raft recovery, this value will also be 0 and JRaft will think 
> that it needs to replay the log from the beginning of time, while actually 
> this came from a storage for an empty table, and its applied index shouldn't 
> even be taken into account. An even bigger problem is that the log might have 
> been truncated and cannot be restored from the 0 index, so the node won't 
> even be able to start.
> As the solution, the following algorithm is proposed:
> # When a Raft snapshot is taken, save the current set of table IDs inside the 
> TX state storage. This means that we have a set of table IDs that 
> participated in the most recent snapshot of this partition;
> # During recovery, for every table partition storage, check the following:
> ##  If this storage contains an applied index (i.e. is not empty), use the 
> current recovery mechanism of choosing the minimum applied index across all 
> storages;
> ## If this storage is empty and *is* present in the set of table IDs from the 
> TX storage, then this storage must have participated in the snapshot, but 
> lost all of its persistent data somehow. In this case, tell JRaft to start 
> recovery from the very beginning of time, either succeeding if we have the 
> Raft log present starting from the 0 index, or throwing in error in case the 
> log has been truncated;
> ## If this storage is empty and *is not* present in the set of table IDs from 
> the TX storage, then this storage is guaranteed to have no writes to it 
> before the most recent snapshot, and we can start the recovery from the 
> position, saved in that snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to