Vladimir, Automatic cluster membership changes may be implemented to grow the topology, but auto-shrinking topology is usually not possible because a process cannot distinguish between a node shutdown and network partitioning. If we want to deal with split-brain scenarios as a grown-up system, we should change the replication strategy within partitions to a consensus algorithm (I really hope we will). None of the consensus algorithms (at least known to me - paxos, raft, ZAB) do auto cluster adjustments based on a internally-detected process failure. I consider baseline topology as a step towards this model.
Addressing your second concern, If a node was down for a short period of time, we should (and we do) rebalance only deltas, which is faster than erasing the whole node and moving all data from scratch. 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > Ivan, > > This reasoning sounds questionable to me. First, separate logic for in > memory and persistent regions means that we loose collocation between > persistent and non persistent caches. Second, “data is still on disk” > assumption might be not valid if node has left due to disk crash, or when > data is updated on remaining nodes. > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <ivan.glu...@gmail.com>: > > > Stan, > > > > I believe it was discussed at the design proposal thread: > > > > http://apache-ignite-developers.2346864.n4.nabble. > com/Cluster-auto-activation-design-proposal-td20295.html > > > > The short answer: backup factor decreases if node leaves. In > > non-persistent mode we have to rebalance data ASAP - otherwise last node > > that owns partition may fail and data will be lost forever. > > This is not necessary if data is persisted to disk storage, that's the > > reason for Baseline Topology concept. > > > > Best Regards, > > Ivan Rakov > > > > On 24.04.2018 18:48, Stanislav Lukyanov wrote: > > > + for Vladimir's point - adding more complexity may (and likely will) > be > > > even more misleading. > > > > > > Can we take a step back and discuss why do we need to have different > > > behavior for persistent and in-memory caches? Can we make in-memory > > caches > > > honor baseline instead of special-casing them? > > > > > > Thanks, > > > Stan > > > > > > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <voze...@gridgain.com>: > > > > > >> Guys, > > >> > > >> As a user I definitely do not want to think about BLATs, SATs, DATs, > > >> whatsoever. I want to query data, iterate over data, send compute > tasks > > to > > >> data. If certain node is outside of BLAT and do not have data, then > > this is > > >> not affinity node. Can we just fix affinity logic to take in count > BLAT > > >> appropriately? > > >> > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <ivan.glu...@gmail.com> > > wrote: > > >> > > >>> Eduard, > > >>> > > >>> Can you please summarize code changes that you are proposing? > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more > sense. > > >>> However, establishing a consensus on v2.4 Baseline Topology > terminology > > >>> took a long time and seems like you are going to cause a bit more > > >>> perturbations. > > >>> I still don't understand what and how should be changed. Please > provide > > >>> summary of upcoming class renamings and changes of existing system > > parts. > > >>> > > >>> Best Regards, > > >>> Ivan Rakov > > >>> > > >>> > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote: > > >>> > > >>>> Hi, Igniters, > > >>>> > > >>>> I want to raise a topic about our affinity node definition. > > >>>> > > >>>> After adding baseline (affinity) topology (BL(A)T) things start > being > > >>>> complicated. > > >>>> > > >>>> Plenty of bugs appears: > > >>>> > > >>>> IGNITE-8173 > > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works > incorrect > > >>>> for > > >>>> replicated cache in case if some data node isn't in baseline > > >>>> > > >>>> IGNITE-7628 > > >>>> SqlQuery hangs indefinitely with additional not registered in > baseline > > >>>> node. > > >>>> > > >>>> It's because everything relies on concept "affinity node". > > >>>> And until now it was as simple as a server node which passes node > > >> filter. > > >>>> Other words any server node which is not filtered out by node > filter. > > >>>> > > >>>> But node which is not in BL(A)T and which passes node filter would > be > > >>>> treated as affinity node. And it's definitely wrong. At least, it > is a > > >>>> source of many bugs (I believe there are much more than those 2 > which > > I > > >>>> already have mentioned). > > >>>> > > >>>> It's clear that this definition should be changed. > > >>>> Let's start with a new definition of "Affinity topology". Affinity > > >>>> topology > > >>>> is a set of nodes which potentially could keep data. > > >>>> > > >>>> If we use knowledge about the current realization we can say that 1. > > for > > >>>> in-memory cache groups it would be all server nodes; > > >>>> 2. for persistent cache groups it would be BL(A)T. > > >>>> > > >>>> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory > > >> cache > > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > > >> point. > > >>>> Denote node filter as f(X), where X is affinity topology. > > >>>> > > >>>> Then we can say that node A is affinity node if > > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > >>>> > > >>>> It worth to mention that AT' should be used to pass to affinity > > function > > >>>> of > > >>>> cache groups. > > >>>> Also, AT and AT' could change during the time (BL(A)T changes or > node > > >>>> joins/disconnections). > > >>>> > > >>>> And I don't like fact that usage of DAT or SAT relies on persistence > > >>>> settings (Should we make it configurable per cache group?). > > >>>> > > >>>> Ok, I have created a ticket to implement this changes and will start > > >>>> working on it. > > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > > >>>> calculation doesn't take into account BLT). > > >>>> > > >>>> Also, I want to use these definitions (Affinity Topology, Affinity > > Node, > > >>>> DAT, SAT) in documentation and java docs. > > >>>> > > >>>> Maybe, we also should consider replacing BL(A)T with SAT. > > >>>> > > >>>> Thank you for your attention. > > >>>> > > >>>> > > > > >