[ https://issues.apache.org/jira/browse/IGNITE-24811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksandr Polovtsev updated IGNITE-24811: ----------------------------------------- Epic Link: IGNITE-22115 > Handle the case when a table storage is empty on Zone Raft group recovery > ------------------------------------------------------------------------- > > Key: IGNITE-24811 > URL: https://issues.apache.org/jira/browse/IGNITE-24811 > Project: Ignite > Issue Type: Improvement > Reporter: Aleksandr Polovtsev > Assignee: Aleksandr Polovtsev > Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > Inside the colocation track, the following problem exists: > When a new table processor is added to the ZonePartitionRaftListener its > storage gets initialized with some information, like the last applied index > and Raft group configuration. However, a node can die or be restarted before > this information gets flushed onto a persistent storage which means that upon > the consecutive startup, this storage will return 0 as its last applied > index. Since on startup we use the minimum last applied index across all > storages during Raft recovery, this value will also be 0 and JRaft will think > that it needs to replay the log from the beginning of time, while actually > this came from a storage for an empty table, and its applied index shouldn't > even be taken into account. An even bigger problem is that the log might have > been truncated and cannot be restored from the 0 index, so the node won't > even be able to start. > As the solution, the following algorithm is proposed: > # When a Raft snapshot is taken, save the current set of table IDs inside the > TX state storage. This means that we have a set of table IDs that > participated in the most recent snapshot of this partition; > # During recovery, for every table partition storage, check the following: > ## If this storage contains an applied index (i.e. is not empty), use the > current recovery mechanism of choosing the minimum applied index across all > storages; > ## If this storage is empty and *is* present in the set of table IDs from the > TX storage, then this storage must have participated in the snapshot, but > lost all of its persistent data somehow. In this case, tell JRaft to start > recovery from the very beginning of time, either succeeding if we have the > Raft log present starting from the 0 index, or throwing in error in case the > log has been truncated; > ## If this storage is empty and *is not* present in the set of table IDs from > the TX storage, then this storage is guaranteed to have no writes to it > before the most recent snapshot, and we can start the recovery from the > position, saved in that snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010)