[
https://issues.apache.org/jira/browse/IGNITE-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442919#comment-16442919
]
Ivan Rakov commented on IGNITE-8241:
------------------------------------
I propose the following version of BaselineWatcher:
{noformat}
package org.apache.ignite.examples.events;
import java.util.Set;
import java.util.stream.Collectors;
import org.apache.ignite.Ignite;
import org.apache.ignite.cluster.BaselineNode;
import org.apache.ignite.cluster.ClusterNode;
import org.apache.ignite.events.DiscoveryEvent;
import org.apache.ignite.events.EventType;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.processors.timeout.GridTimeoutObjectAdapter;
/**
* Task that mimics old behavior without baseline topology. Only one task
should be started for the whole cluster.
* In case of server node leave/join, BLT will be automatically reset after
{@link #bltChangeDelayMillis} delay.
*/
public class BaselineWatcher {
/** Ignite. */
private final IgniteEx ignite;
/** BLT change delay millis. */
private final long bltChangeDelayMillis;
/**
* @param ignite Ignite.
*/
public BaselineWatcher(Ignite ignite, long bltChangeDelayMillis) {
this.ignite = (IgniteEx)ignite;
this.bltChangeDelayMillis = bltChangeDelayMillis;
}
/**
*
*/
public void start() {
ignite.events().localListen(event -> {
DiscoveryEvent e = (DiscoveryEvent)event;
Set<Object> aliveSrvNodes = e.topologyNodes().stream()
.filter(n -> !n.isClient())
.map(ClusterNode::consistentId)
.collect(Collectors.toSet());
Set<Object> baseline =
ignite.cluster().currentBaselineTopology().stream()
.map(BaselineNode::consistentId)
.collect(Collectors.toSet());
final long topVer = e.topologyVersion();
if (!aliveSrvNodes.equals(baseline))
ignite.context().timeout().addTimeoutObject(new
GridTimeoutObjectAdapter(bltChangeDelayMillis) {
@Override public void onTimeout() {
if (ignite.cluster().topologyVersion() == topVer)
ignite.cluster().setBaselineTopology(topVer);
}
});
return true;
}, EventType.EVT_NODE_FAILED, EventType.EVT_NODE_LEFT,
EventType.EVT_NODE_JOINED);
}
}
{noformat}
Pros:
1) Baseline will changed only one time in case of several sequential topology
changes within a short period
2) Baseline will be changed back in case missing node will be finally returned
Simply put, cluster will behave just like in 2.3.
> Docs: Triggering automatic rebalancing if the whole baseline topology is not
> recovered
> --------------------------------------------------------------------------------------
>
> Key: IGNITE-8241
> URL: https://issues.apache.org/jira/browse/IGNITE-8241
> Project: Ignite
> Issue Type: Task
> Components: documentation
> Affects Versions: 2.4
> Reporter: Denis Magda
> Assignee: Denis Magda
> Priority: Critical
> Fix For: 2.5
>
> Attachments: BaselineWatcher.java
>
>
> The ticket is created as a result of the following discussion:
> http://apache-ignite-developers.2346864.n4.nabble.com/Triggering-rebalancing-on-timeout-or-manually-if-the-baseline-topology-is-not-reassembled-td29299.html
> The rebalancing doesn't happen if one of the nodes goes down,
> thus, shrinking the baseline topology. It complies with our assumption that
> the node should be recovered soon and there is no need to waste
> CPU/memory/networking resources of the cluster shifting the data around.
> However, there are always edge cases. I was reasonably asked how to trigger
> the rebalancing within the baseline topology manually or on timeout if:
> * It's not expected that the failed node would be resurrected in the
> nearest time and
> * It's not likely that that node will be replaced by the other one.
> Until we embedd special facilities in the baseline topology that would
> consider such situations we can document the following workaround. A user
> application/tool/script has to subscribe to node_left events and remove the
> failed node from the baseline topology in some time. Once the node is
> removed, the baseline topology will be changed, and the rebalancing will be
> kicked off.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)