Pavel, thanks for the suggestions. They would definitely work out. I would document the one with the event subscription: https://issues.apache.org/jira/browse/IGNITE-8241
Could you help preparing a sample code snippet with such a listener that will be added to the doc? I know that there are some caveats related to the way how such an event has to be processed. Ivan, truly like your idea. Alex G., what's your thought on this? -- Denis On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov <ivan.glu...@gmail.com> wrote: > Guys, > > I also heard complaints about absence of option to automatically change > baseline topology. They absolutely make sense. > What Pavel suggested will work as a workaround. I think, in future > releases we should give user an option to enable a similar behavior via > Ignite Configuration. > It may be called "Baseline Topology change policy". I see it as rule-based > language, which allows to specify conditions of BLT change using several > parameters - timeout and minimum allowed number of partition copies left > (maybe this option should be provided also on per-cache-group level). > Policy can also specify conditions for including new nodes in BLT if they > are present - including node attributes filters and so on. > > What do you think? > > Best Regards, > Ivan Rakov > > > On 12.04.2018 19:41, Pavel Kovalenko wrote: > >> Denis, >> >> It's just one of the ways to implement it. We also can subscribe on node >> join / fail events to properly track downtime of a node. >> >> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko <jokse...@gmail.com>: >> >> Denis, >>> >>> Using our API we can implement this task as follows: >>> Do each minute: >>> 1) Get all alive server nodes consistent ids => >>> ignite().context().discovery().aliveServerNodes() => >>> mapToConsistentIds(). >>> 2) Get current baseline topology => ignite().cluster(). >>> currentBaselineTopology() >>> 3) For each node in baseline and not in alive server nodes check timeout >>> for this node. >>> 4) If timeout is reached remove node from baseline >>> 5) If baseline is changed set new baseline => ignite().cluster(). >>> setNewBaseline() >>> >>> >>> 2018-04-12 2:18 GMT+03:00 Denis Magda <dma...@apache.org>: >>> >>> Pavel, Val, >>>> >>>> So, it means that the rebalancing will be initiated only after an >>>> administrator remove the failed node from the topology, right? >>>> >>>> Next, imagine that you are that IT administrator who has to automate the >>>> rebalancing activation if the node failed and not recovered within 1 >>>> minute. What would you do and what Ignite provides to fulfill the task? >>>> >>>> -- >>>> Denis >>>> >>>> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <jokse...@gmail.com> >>>> wrote: >>>> >>>> Denis, >>>>> >>>>> In case of incomplete baseline topology IgniteCache.rebalance() will do >>>>> nothing, because this event doesn't trigger partitions exchange or >>>>> >>>> affinity >>>> >>>>> change, so states of existing partitions are hold. >>>>> >>>>> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko < >>>>> valentin.kuliche...@gmail.com>: >>>>> >>>>> Denis, >>>>>> >>>>>> In my understanding, in this case you should remove node from BLT and >>>>>> >>>>> that >>>>> >>>>>> will trigger the rebalancing, no? >>>>>> >>>>>> -Val >>>>>> >>>>>> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <dma...@gridgain.com> >>>>>> >>>>> wrote: >>>>> >>>>>> Igniters, >>>>>>> >>>>>>> As we know the rebalancing doesn't happen if one of the nodes goes >>>>>>> >>>>>> down, >>>>> >>>>>> thus, shrinking the baseline topology. It complies with our >>>>>>> >>>>>> assumption >>>> >>>>> that >>>>>> >>>>>>> the node should be recovered soon and there is no need to waste >>>>>>> CPU/memory/networking resources of the cluster shifting the data >>>>>>> >>>>>> around. >>>>> >>>>>> However, there are always edge cases. I was reasonably asked how to >>>>>>> >>>>>> trigger >>>>>> >>>>>>> the rebalancing within the baseline topology manually or on timeout >>>>>>> >>>>>> if: >>>> >>>>> - It's not expected that the failed node would be resurrected in >>>>>>> >>>>>> the >>>> >>>>> nearest time and >>>>>>> - It's not likely that that node will be replaced by the other >>>>>>> >>>>>> one. >>>> >>>>> The question. If I call IgniteCache.rebalance() or configure >>>>>>> CacheConfiguration.rebalanceTimeout will the rebalancing be fired >>>>>>> >>>>>> within >>>>> >>>>>> the baseline topology? >>>>>>> >>>>>>> -- >>>>>>> Denis >>>>>>> >>>>>>> >>> >