[ https://issues.apache.org/jira/browse/IGNITE-19288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mirza Aliev updated IGNITE-19288: --------------------------------- Description: h3. Motivation If new logical topology has a new nodes and nodes that left cluster then DistributionZoneManager#scheduleTimers() schedules saveDataNodesOnScaleUp and saveDataNodesOnScaleDown. These tasks are invoked asynchronously but use the same entry in topologyAugmentationMap. So scale up puts entry with some revision and then scale down puts entry with the same revision as key. The issue is reproduced by DistributionZoneAwaitDataNodesTest#testSeveralScaleUpAndSeveralScaleDownThenScaleUpAndScaleDown h3. Definition of Done * Concurrency bug is fixed. * Test is enabled. UPD: The problem in general could be reproducible in very rare case, namely in the scenario, when we have received {{LogicalTopologyEventListener#onTopologyLeap}} and there were added and removed nodes in this Topology comparing with the topology from metastorage. The solution is to change representation of the {{DistributionZoneManager.ZoneState#topologyAugmentationMap}}. We have {code:java} private static class Augmentation { /** Names of the node. */ Set<NodeWithAttributes> nodes; /** Flag that indicates whether {@code nodeNames} should be added or removed. */ boolean addition; Augmentation(Set<NodeWithAttributes> nodes, boolean addition) { this.nodes = nodes; this.addition = addition; } } {code} I suggest to store flag addition in the {{NodeWithAttributes}}, so we could have different types of node in terms of added or removed node for a revision in the {{DistributionZoneManager.ZoneState#topologyAugmentationMap}}. was: h3. Motivation If new logical topology has a new nodes and nodes that left cluster then DistributionZoneManager#scheduleTimers() schedules saveDataNodesOnScaleUp and saveDataNodesOnScaleDown. These tasks are invoked asynchronously but use the same entry in topologyAugmentationMap. So scale up puts entry with some revision and then scale down puts entry with the same revision as key. The issue is reproduced by DistributionZoneAwaitDataNodesTest#testSeveralScaleUpAndSeveralScaleDownThenScaleUpAndScaleDown h3. Definition of Done * Concurrency bug is fixed. * Test is enabled. > A race on scheduling data nodes updates if there new nodes and stopped nodes > in logical topology > ------------------------------------------------------------------------------------------------ > > Key: IGNITE-19288 > URL: https://issues.apache.org/jira/browse/IGNITE-19288 > Project: Ignite > Issue Type: Bug > Reporter: Sergey Uttsel > Assignee: Mirza Aliev > Priority: Major > Labels: ignite-3 > > h3. Motivation > If new logical topology has a new nodes and nodes that left cluster then > DistributionZoneManager#scheduleTimers() schedules saveDataNodesOnScaleUp and > saveDataNodesOnScaleDown. These tasks are invoked asynchronously but use the > same entry in topologyAugmentationMap. So scale up puts entry with some > revision and then scale down puts entry with the same revision as key. > The issue is reproduced by > DistributionZoneAwaitDataNodesTest#testSeveralScaleUpAndSeveralScaleDownThenScaleUpAndScaleDown > h3. Definition of Done > * Concurrency bug is fixed. > * Test is enabled. > UPD: > The problem in general could be reproducible in very rare case, namely in the > scenario, when we have received > {{LogicalTopologyEventListener#onTopologyLeap}} and there were added and > removed nodes in this Topology comparing with the topology from metastorage. > The solution is to change representation of the > {{DistributionZoneManager.ZoneState#topologyAugmentationMap}}. > We have > {code:java} > private static class Augmentation { > /** Names of the node. */ > Set<NodeWithAttributes> nodes; > /** Flag that indicates whether {@code nodeNames} should be added or > removed. */ > boolean addition; > Augmentation(Set<NodeWithAttributes> nodes, boolean addition) { > this.nodes = nodes; > this.addition = addition; > } > } > {code} > I suggest to store flag addition in the {{NodeWithAttributes}}, so we could > have different types of node in terms of added or removed node for a revision > in the {{DistributionZoneManager.ZoneState#topologyAugmentationMap}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)