>> Vyacheslav, please elaborate on how we can determine whether we are on the same rack. I am not sure this is possible in general case. Please see my suggestions below. >>
I thought of latency values. Latency between host nodes < Latency between same rack nodes < Latency between subnet nodes < etc. 2016-12-26 12:20 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>: > >> > For example, ordering on latency: > - nodes on one host = 1 > - nodes in one rack-blade = 2 > - nodes in one server-rack = 3 > - nodes in one physical cluster = 4 > - nodes in one subnet = 5 > - etc. > > Maybe it'll be better to use some metrics from ClusterMetrics interface. > > The algorithm of ordering can be implemented in a class such as Comparator > and use it when we build a cluster or we select a place for a new node. > >> > > Vyacheslav, please elaborate on how we can determine whether we are on the > same rack. I am not sure this is possible in general case. Please see my > suggestions below. > > >> > However, here is the concern I have. Currently when a new node joins, > coordinator assigns order number to this node (e.g. if we already have > nodes 1,2 and 3, new node will have order 4). This node will then be the > last one on the ring, i.e. nodes are always ordered in the ring by this > order number (1->2->3->4->1). If we change this, we will basically allow a > node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100% > sure if this is going to cause issues, but sounds dangerous. > > Yakov, can you please chime in and share your thoughts on this? > >> > > I don't think this may cause issues. Nodes ordering and placement is > implemented in TcpDiscoveryNodesRing and I think that we will just need to > alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing# > nextNode(java.util.Collection<org.apache.ignite.spi. > discovery.tcp.internal.TcpDiscoveryNode>) > logic. > > As far as design of this, I would suggest the following. > > 1. User should have an ability to define ARC_ID for the node. I suggest > "arc" for this since we are using "ring" concept. This will be the most > honored characteristic for nodes placement. By default arc_id is 0 and > possible to set with system property IGNITE_DISCO_ARC_ID or env variable or > via TcpDiscoverySpi.setArcId() - new method. > So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set > to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can > represent different racks or data centers. > > I am strongly against giving user an opportunity to point exact place in > the ring with somewhat like this interface [int getIdex(Node newNode, > List<Node> currentRing)]. This is very error prone and may require tricky > consistency checks just to make sure that implementation of this interface > is consistent along the topology. > With "arcs" approach user can automatically assign proper ids basing on > physical network topology and network routes. > > 2. Subnet - 2nd honored parameter. Nodes on the same subnet should be > placed side by side in the same arc. > > 3. Physical host - 3rd honored parameter. Nodes on the same physical host > should be placed together automatically in the same arc. > > 4. New mode involving points 1-3 should become default and we should also > provide ability to switch to current mode which should become legacy. > > --Yakov >