[ 
https://issues.apache.org/jira/browse/IGNITE-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-25250:
-----------------------------------------
    Description: 
Let's consider the following setup: 3-node cluster, 1 cmg node, 1 default 
distribution zone.
When the cluster is initialized, cluster nodes are included in the logical 
topology one by one:
{noformat}
>>>>> idsst_qwas_3344 - cmg node

[DataNodesManager] Topology change detected [
    zoneId=0, revision=7, 
    timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], 
    newTopology=[idsst_qwas_3344], 
    oldTopology=[idsst_qwas_3344]].
...
[DataNodesManager] Updated data nodes on topology change, history entry not 
added [
    zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 
12:34:48:143 +0000, logical=2, composite=114398625014939650], 
latestNodesWritten=[]], 
    scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 
+0000, logical=2, composite=114398625014939650],
    timeToWaitInSeconds=5, 
    nodes=[idsst_qwas_3344]], scaleDownTimer=[empty]].
{noformat}
The important thing is that the history entry was not added.

The rest of the nodes try to handle the same event:
{noformat}
[%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].
...
[%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].
{noformat}
In general, these retries should be rejected:
{code:java}
    private DataNodesHistoryMetaStorageOperation onTopologyChangeInternal(...) {
        ...
        DataNodesHistory dataNodesHistory = 
dataNodesHistoryContext.dataNodesHistory();        

        if (dataNodesHistory.entryIsPresentAtExactTimestamp(timestamp)) {
            // This event was already processed by another node.
            return null;
        }
        ...
    }
 {code}
but, it does not work when the history entry was not added, and so the nodes 
proceed with handling the topology event:
{noformat}
[%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].

[%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].

[%idsst_qwas_3345] Updated data nodes on topology change, history entry not 
added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 
12:34:48:143 +0000, logical=2, composite=114398625014939650], 
latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp 
[physical=2025-04-25 12:34:48:143 +0000, logical=2, 
composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, 
idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]].

[%idsst_qwas_3346][DataNodesManager] Updated data nodes on topology change, 
history entry not added [zoneId=0, currentTimestamp=HybridTimestamp 
[physical=2025-04-25 12:34:48:143 +0000, logical=2, 
composite=114398625014939650], latestNodesWritten=[]], 
scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 
+0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, 
nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], 
scaleDownTimer=[empty]].
{noformat}
 

  was:
Let's consider the following setup: 3-node cluster, 1 cmg node, 1 default 
distribution zone.
When the cluster is initialized, cluster nodes are included in the logical 
topology one by one:
{noformat}
>>>>> idsst_qwas_3344 - cmg node

[DataNodesManager] Topology change detected [zoneId=0, revision=7, 
timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, logical=2, 
composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].
...
[DataNodesManager] Updated data nodes on topology change, history entry not 
added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 
12:34:48:143 +0000, logical=2, composite=114398625014939650], 
latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp 
[physical=2025-04-25 12:34:48:143 +0000, logical=2, 
composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344]], 
scaleDownTimer=[empty]].
{noformat}
The important thing is that the history entry was not added.

The rest of the nodes try to handle the same event:
{noformat}
[%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].
...
[%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].
{noformat}
In general, these retries should be rejected:
{code:java}
    private DataNodesHistoryMetaStorageOperation onTopologyChangeInternal(...) {
        ...
        DataNodesHistory dataNodesHistory = 
dataNodesHistoryContext.dataNodesHistory();        

        if (dataNodesHistory.entryIsPresentAtExactTimestamp(timestamp)) {
            // This event was already processed by another node.
            return null;
        }
        ...
    }
 {code}
but, it does not work when the history entry was not added, and so the nodes 
proceed with handling the topology event:
{noformat}
[%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].

[%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected [zoneId=0, 
revision=7, timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
logical=2, composite=114398625014939650], newTopology=[idsst_qwas_3344], 
oldTopology=[idsst_qwas_3344]].

[%idsst_qwas_3345] Updated data nodes on topology change, history entry not 
added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 
12:34:48:143 +0000, logical=2, composite=114398625014939650], 
latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp 
[physical=2025-04-25 12:34:48:143 +0000, logical=2, 
composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, 
idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]].

[%idsst_qwas_3346][DataNodesManager] Updated data nodes on topology change, 
history entry not added [zoneId=0, currentTimestamp=HybridTimestamp 
[physical=2025-04-25 12:34:48:143 +0000, logical=2, 
composite=114398625014939650], latestNodesWritten=[]], 
scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 
+0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, 
nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], 
scaleDownTimer=[empty]].
{noformat}
 


> DataNodesManager can erroneously retry and successfully accept topology 
> changes that were already accepted.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-25250
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25250
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vyacheslav Koptilin
>            Priority: Major
>              Labels: ignite-3
>
> Let's consider the following setup: 3-node cluster, 1 cmg node, 1 default 
> distribution zone.
> When the cluster is initialized, cluster nodes are included in the logical 
> topology one by one:
> {noformat}
> >>>>> idsst_qwas_3344 - cmg node
> [DataNodesManager] Topology change detected [
>     zoneId=0, revision=7, 
>     timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 +0000, 
> logical=2, composite=114398625014939650], 
>     newTopology=[idsst_qwas_3344], 
>     oldTopology=[idsst_qwas_3344]].
> ...
> [DataNodesManager] Updated data nodes on topology change, history entry not 
> added [
>     zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 
> 12:34:48:143 +0000, logical=2, composite=114398625014939650], 
> latestNodesWritten=[]], 
>     scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 
> +0000, logical=2, composite=114398625014939650],
>     timeToWaitInSeconds=5, 
>     nodes=[idsst_qwas_3344]], scaleDownTimer=[empty]].
> {noformat}
> The important thing is that the history entry was not added.
> The rest of the nodes try to handle the same event:
> {noformat}
> [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected 
> [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 
> 12:34:48:143 +0000, logical=2, composite=114398625014939650], 
> newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]].
> ...
> [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected 
> [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 
> 12:34:48:143 +0000, logical=2, composite=114398625014939650], 
> newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]].
> {noformat}
> In general, these retries should be rejected:
> {code:java}
>     private DataNodesHistoryMetaStorageOperation 
> onTopologyChangeInternal(...) {
>         ...
>         DataNodesHistory dataNodesHistory = 
> dataNodesHistoryContext.dataNodesHistory();        
>         if (dataNodesHistory.entryIsPresentAtExactTimestamp(timestamp)) {
>             // This event was already processed by another node.
>             return null;
>         }
>         ...
>     }
>  {code}
> but, it does not work when the history entry was not added, and so the nodes 
> proceed with handling the topology event:
> {noformat}
> [%idsst_qwas_3346][DataNodesManager] >>>>> Topology change detected 
> [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 
> 12:34:48:143 +0000, logical=2, composite=114398625014939650], 
> newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]].
> [%idsst_qwas_3345][DataNodesManager] >>>>> Topology change detected 
> [zoneId=0, revision=7, timestamp=HybridTimestamp [physical=2025-04-25 
> 12:34:48:143 +0000, logical=2, composite=114398625014939650], 
> newTopology=[idsst_qwas_3344], oldTopology=[idsst_qwas_3344]].
> [%idsst_qwas_3345] Updated data nodes on topology change, history entry not 
> added [zoneId=0, currentTimestamp=HybridTimestamp [physical=2025-04-25 
> 12:34:48:143 +0000, logical=2, composite=114398625014939650], 
> latestNodesWritten=[]], scaleUpTimer=[timestamp=HybridTimestamp 
> [physical=2025-04-25 12:34:48:143 +0000, logical=2, 
> composite=114398625014939650], timeToWaitInSeconds=5, nodes=[idsst_qwas_3344, 
> idsst_qwas_3346, idsst_qwas_3345]], scaleDownTimer=[empty]].
> [%idsst_qwas_3346][DataNodesManager] Updated data nodes on topology change, 
> history entry not added [zoneId=0, currentTimestamp=HybridTimestamp 
> [physical=2025-04-25 12:34:48:143 +0000, logical=2, 
> composite=114398625014939650], latestNodesWritten=[]], 
> scaleUpTimer=[timestamp=HybridTimestamp [physical=2025-04-25 12:34:48:143 
> +0000, logical=2, composite=114398625014939650], timeToWaitInSeconds=5, 
> nodes=[idsst_qwas_3344, idsst_qwas_3346, idsst_qwas_3345]], 
> scaleDownTimer=[empty]].
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to