[ https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mirza Aliev updated IGNITE-20603: --------------------------------- Summary: Restore logical topology change event on a node restart (was: Restore topologyAugmentationMap on a node restart) > Restore logical topology change event on a node restart > ------------------------------------------------------- > > Key: IGNITE-20603 > URL: https://issues.apache.org/jira/browse/IGNITE-20603 > Project: Ignite > Issue Type: Bug > Reporter: Mirza Aliev > Assignee: Mirza Aliev > Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > h3. *Motivation* > It is possible that some events were propagated to {{ms.logicalTopology}}, > but restart happened when we were updating topologyAugmentationMap in > {{DistributionZoneManager#createMetastorageTopologyListener}}. That means > that augmentation that must be added to {{zone.topologyAugmentationMap}} > wasn't added and we need to recover this information. > h3. *Definition of done* > On a node restart, topologyAugmentationMap must be correctly restored > according to {{ms.logicalTopology}} state. > h3. *Implementation notes* > (outdated, see UPD) > For every zone, compare {{MS.local.logicalTopology.revision}} with > max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is > greater than max(maxScUpFromMap, maxScDownFromMap), that means that some > topology changes haven't been propagated to topologyAugmentationMap before > restart and appropriate timers haven't been scheduled. To fill the gap in > topologyAugmentationMap, compare {{MS.local.logicalTopology}} with > {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the > nodes that did not have time to be propagated to topologyAugmentationMap > before restart. {{lastSeenTopology}} is calculated in the following way: we > read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, > scaleDownTriggerKey) and retrieve all additions and removals of nodes from > the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) > as the left bound. After that apply these changes to the map with nodes > counters from {{MS.local.dataNodes}} and take nodes only with the positive > counters. This is the lastSeenTopology. Comparing it with > {{MS.local.logicalTopology}} will tell us which nodes were not added or > removed and weren't propagated to topologyAugmentationMap before restart. We > take these differences and add them to the topologyAugmentationMap. As a > revision (key for topologyAugmentationMap) take > {{MS.local.logicalTopology.revision}}. It is safe to take this revision, > because if some node was added to the {{ms.topology}} after immediate data > nodes recalculation, this added node must restore this immediate data nodes' > recalculation intent. > UPD: Implementation notes are outdated, we've implemented a bit different > approach: now we save the last handled topology to MS, and on restart we > check if the current ms.logicalTopology differs from the one that was handled > in DistributionZoneManager#createMetastorageTopologyListener (we check > revision of this events), then we just repeat the logic from > DistributionZoneManager#createMetastorageTopologyListener with the new > logical topology from the ms.logicalTopology. -- This message was sent by Atlassian Jira (v8.20.10#820010)