[ https://issues.apache.org/jira/browse/IGNITE-18694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Uttsel updated IGNITE-18694: ----------------------------------- Description: h3. *Motivation* DistributionZoneRebalanceEngine#dataNodesListener processes events with zones' data nodes updates and invokes RebalanceUtil#updatePendingAssignmentsKeys with new data nodes value. updatePendingAssignmentsKeys does async metaStorageMgr#invoke. It's possible that dataNodesListener processed data nodes event then the nodes crashed without updating assignments in metastorage. h3. *Implementation Notes* To fix it we can redo all logic from `createDistributionZonesDataNodesListener()`. On DistributionZoneManager#start we need to read from `vault` data nodes for all zones and invoke `updatePendingAssignmentsKeys` for all tables with `metaStorageManager.appliedRevision()`. If the last event with data nodes has not updated pending assignments then assignments will be updated. If the last event with data nodes have updated pending assignments then an update assignments will be invoked another time but it will not update assignments, because there is a check for case when new assignments equals to old one. h3. *Definition of Done* Created a recovery for this case. was: h3. *Motivation* DistributionZoneRebalanceEngine#dataNodesListener processes events with zones' data nodes updates and invokes RebalanceUtil#updatePendingAssignmentsKeys with new data nodes value. updatePendingAssignmentsKeys does async metaStorageMgr#invoke. It's possible that dataNodesListener processed data nodes event then the nodes crashed without updating assignments in metastorage. h3. *Implementation Notes* To fix it we can redo all logic from `createDistributionZonesDataNodesListener()`. On DistributionZoneManager#start we need to read from `vault` data nodes for all zones and invoke `updatePendingAssignmentsKeys` for all tables with `metaStorageManager.appliedRevision()`. If the last event with data nodes has not updated pending assignments then assignments will be updated. If the last event with data nodes have updated pending assignments then an update assignments will be invoked another time but it will not update assignments, because there is a check for case when new assignments equals to old one. h3. *Definition of Done* Created a recovery for this case or make a guarantee that if the event is processed, then the assignments were written to the metastorage. > Recovery for DistributionZoneRebalanceEngine#metaStorageManager on > DistributionZoneManager#start() > -------------------------------------------------------------------------------------------------- > > Key: IGNITE-18694 > URL: https://issues.apache.org/jira/browse/IGNITE-18694 > Project: Ignite > Issue Type: Bug > Reporter: Sergey Uttsel > Priority: Major > Labels: ignite-3 > > h3. *Motivation* > DistributionZoneRebalanceEngine#dataNodesListener processes events with > zones' data nodes updates and invokes > RebalanceUtil#updatePendingAssignmentsKeys with new data nodes value. > updatePendingAssignmentsKeys does async metaStorageMgr#invoke. It's possible > that dataNodesListener processed data nodes event then the nodes crashed > without updating assignments in metastorage. > h3. *Implementation Notes* > To fix it we can redo all logic from > `createDistributionZonesDataNodesListener()`. On > DistributionZoneManager#start we need to read from `vault` data nodes for all > zones and invoke `updatePendingAssignmentsKeys` for all tables with > `metaStorageManager.appliedRevision()`. If the last event with data nodes has > not updated pending assignments then assignments will be updated. If the last > event with data nodes have updated pending assignments then an update > assignments will be invoked another time but it will not update assignments, > because there is a check for case when new assignments equals to old one. > h3. *Definition of Done* > Created a recovery for this case. -- This message was sent by Atlassian Jira (v8.20.10#820010)