[ https://issues.apache.org/jira/browse/IGNITE-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vyacheslav Koptilin updated IGNITE-25250: ----------------------------------------- Description: Let's consider the following setup: 3-node cluster, 1 cmg node, 1 default distribution zone. When the cluster is initialized, cluster nodes are included in the logical topology one by one: {noformat} >>>>> idsst_qwas_3344 - cmg node [DataNodesManager] Topology change detected [ zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. ... [DataNodesManager] Updated data nodes on topology change, history entry not added [ zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344]], scaleDownTimer=[empty]]. {noformat} The important thing is that the history entry was not added. The rest of the nodes try to handle the same event: {noformat} [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. ... [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. {noformat} In general, these retries should be rejected: {code:java} private DataNodesHistoryMetaStorageOperation onTopologyChangeInternal(...) { ... DataNodesHistory dataNodesHistory = dataNodesHistoryContext.dataNodesHistory(); if (dataNodesHistory.entryIsPresentAtExactTimestamp(timestamp)) { // This event was already processed by another node. return null; } ... } {code} but, it does not work when the history entry was not added, and so the nodes proceed with handling the topology event: {noformat} [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. [%idsst_qwas_3345] Updated data nodes on topology change, history entry not added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]]. [%idsst_qwas_3346][DataNodesManager] Updated data nodes on topology change, history entry not added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]]. {noformat} was: Let's consider the following setup: 3-node cluster, 1 cmg node, 1 default distribution zone. When the cluster is initialized, cluster nodes are included in the logical topology one by one: {noformat} >>>>> idsst_qwas_3344 - cmg node [DataNodesManager] Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. ... [DataNodesManager] Updated data nodes on topology change, history entry not added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344]], scaleDownTimer=[empty]]. {noformat} The important thing is that the history entry was not added. The rest of the nodes try to handle the same event: {noformat} [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. ... [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. {noformat} In general, these retries should be rejected: {code:java} private DataNodesHistoryMetaStorageOperation onTopologyChangeInternal(...) { ... DataNodesHistory dataNodesHistory = dataNodesHistoryContext.dataNodesHistory(); if (dataNodesHistory.entryIsPresentAtExactTimestamp(timestamp)) { // This event was already processed by another node. return null; } ... } {code} but, it does not work when the history entry was not added, and so the nodes proceed with handling the topology event: {noformat} [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. [%idsst_qwas_3345] Updated data nodes on topology change, history entry not added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]]. [%idsst_qwas_3346][DataNodesManager] Updated data nodes on topology change, history entry not added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]]. {noformat} > DataNodesManager can erroneously retry and successfully accept topology > changes that were already accepted. > ----------------------------------------------------------------------------------------------------------- > > Key: IGNITE-25250 > URL: https://issues.apache.org/jira/browse/IGNITE-25250 > Project: Ignite > Issue Type: Bug > Reporter: Vyacheslav Koptilin > Priority: Major > Labels: ignite-3 > > Let's consider the following setup: 3-node cluster, 1 cmg node, 1 default > distribution zone. > When the cluster is initialized, cluster nodes are included in the logical > topology one by one: > {noformat} > >>>>> idsst_qwas_3344 - cmg node > [DataNodesManager] Topology change detected [ > zoneId=0, revision=7, > timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, > logical=2, composite=114398625014939650], > newTopology=[idsst_qwas_3344], > oldTopology=[idsst_qwas_3344]]. > ... > [DataNodesManager] Updated data nodes on topology change, history entry not > added [ > zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 > 12:34:48:143 +0000, logical=2, composite=114398625014939650], > latestNodesWritten=[]], > scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 > +0000, logical=2, composite=114398625014939650], > timeToWaitInSeconds=5, > nodes=[idsst_qwas_3344]], scaleDownTimer=[empty]]. > {noformat} > The important thing is that the history entry was not added. > The rest of the nodes try to handle the same event: > {noformat} > [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected > [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 > 12:34:48:143 +0000, logical=2, composite=114398625014939650], > newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. > ... > [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected > [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 > 12:34:48:143 +0000, logical=2, composite=114398625014939650], > newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. > {noformat} > In general, these retries should be rejected: > {code:java} > private DataNodesHistoryMetaStorageOperation > onTopologyChangeInternal(...) { > ... > DataNodesHistory dataNodesHistory = > dataNodesHistoryContext.dataNodesHistory(); > if (dataNodesHistory.entryIsPresentAtExactTimestamp(timestamp)) { > // This event was already processed by another node. > return null; > } > ... > } > {code} > but, it does not work when the history entry was not added, and so the nodes > proceed with handling the topology event: > {noformat} > [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected > [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 > 12:34:48:143 +0000, logical=2, composite=114398625014939650], > newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. > [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected > [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 > 12:34:48:143 +0000, logical=2, composite=114398625014939650], > newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]]. > [%idsst_qwas_3345] Updated data nodes on topology change, history entry not > added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 > 12:34:48:143 +0000, logical=2, composite=114398625014939650], > latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp > [physical=2025-04-25 12:34:48:143 +0000, logical=2, > composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, > idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]]. > [%idsst_qwas_3346][DataNodesManager] Updated data nodes on topology change, > history entry not added [zoneId=0, currentTimestamp=HybridTimestamp > [physical=2025-04-25 12:34:48:143 +0000, logical=2, > composite=114398625014939650], latestNodesWritten=[]], > scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 > +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, > nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], > scaleDownTimer=[empty]]. > {noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010)