Vladimir,

Automatic cluster membership changes may be implemented to grow the
topology, but auto-shrinking topology is usually not possible because a
process cannot distinguish between a node shutdown and network
partitioning. If we want to deal with split-brain scenarios as a grown-up
system, we should change the replication strategy within partitions to a
consensus algorithm (I really hope we will). None of the consensus
algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
adjustments based on a internally-detected process failure. I consider
baseline topology as a step towards this model.

Addressing your second concern, If a node was down for a short period of
time, we should (and we do) rebalance only deltas, which is faster than
erasing the whole node and moving all data from scratch.

2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Ivan,
>
> This reasoning sounds questionable to me. First, separate logic for in
> memory and persistent regions means that we loose collocation between
> persistent and non persistent caches. Second, “data is still on disk”
> assumption might be not valid if node has left due to disk crash, or when
> data is updated on remaining nodes.
>
> вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <ivan.glu...@gmail.com>:
>
> > Stan,
> >
> > I believe it was discussed at the design proposal thread:
> >
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/Cluster-auto-activation-design-proposal-td20295.html
> >
> > The short answer: backup factor decreases if node leaves. In
> > non-persistent mode we have to rebalance data ASAP - otherwise last node
> > that owns partition may fail and data will be lost forever.
> > This is not necessary if data is persisted to disk storage, that's the
> > reason for Baseline Topology concept.
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 24.04.2018 18:48, Stanislav Lukyanov wrote:
> > > + for Vladimir's point - adding more complexity may (and likely will)
> be
> > > even more misleading.
> > >
> > > Can we take a step back and discuss why do we need to have different
> > > behavior for persistent and in-memory caches? Can we make in-memory
> > caches
> > > honor baseline instead of special-casing them?
> > >
> > > Thanks,
> > > Stan
> > >
> > >
> > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <voze...@gridgain.com>:
> > >
> > >> Guys,
> > >>
> > >> As a user I definitely do not want to think about BLATs, SATs, DATs,
> > >> whatsoever. I want to query data, iterate over data, send compute
> tasks
> > to
> > >> data. If certain node is outside of BLAT and do not have data, then
> > this is
> > >> not affinity node. Can we just fix affinity logic to take in count
> BLAT
> > >> appropriately?
> > >>
> > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <ivan.glu...@gmail.com>
> > wrote:
> > >>
> > >>> Eduard,
> > >>>
> > >>> Can you please summarize code changes that you are proposing?
> > >>> I agree that BLT is a bit misleading term and DAT/SAT make more
> sense.
> > >>> However, establishing a consensus on v2.4 Baseline Topology
> terminology
> > >>> took a long time and seems like you are going to cause a bit more
> > >>> perturbations.
> > >>> I still don't understand what and how should be changed. Please
> provide
> > >>> summary of upcoming class renamings and changes of existing system
> > parts.
> > >>>
> > >>> Best Regards,
> > >>> Ivan Rakov
> > >>>
> > >>>
> > >>> On 24.04.2018 17:46, Eduard Shangareev wrote:
> > >>>
> > >>>> Hi, Igniters,
> > >>>>
> > >>>> I want to raise a topic about our affinity node definition.
> > >>>>
> > >>>> After adding baseline (affinity) topology (BL(A)T) things start
> being
> > >>>> complicated.
> > >>>>
> > >>>> Plenty of bugs appears:
> > >>>>
> > >>>> IGNITE-8173
> > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works
> incorrect
> > >>>> for
> > >>>> replicated cache in case if some data node isn't in baseline
> > >>>>
> > >>>> IGNITE-7628
> > >>>> SqlQuery hangs indefinitely with additional not registered in
> baseline
> > >>>> node.
> > >>>>
> > >>>> It's because everything relies on concept "affinity node".
> > >>>> And until now it was as simple as a server node which passes node
> > >> filter.
> > >>>> Other words any server node which is not filtered out by node
> filter.
> > >>>>
> > >>>> But node which is not in BL(A)T and which passes node filter would
> be
> > >>>> treated as affinity node. And it's definitely wrong. At least, it
> is a
> > >>>> source of many bugs (I believe there are much more than those 2
> which
> > I
> > >>>> already have mentioned).
> > >>>>
> > >>>> It's clear that this definition should be changed.
> > >>>> Let's start with a new definition of "Affinity topology". Affinity
> > >>>> topology
> > >>>> is a set of nodes which potentially could keep data.
> > >>>>
> > >>>> If we use knowledge about the current realization we can say that 1.
> > for
> > >>>> in-memory cache groups it would be all server nodes;
> > >>>> 2. for persistent cache groups it would be BL(A)T.
> > >>>>
> > >>>> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory
> > >> cache
> > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd
> > >> point.
> > >>>> Denote node filter as f(X), where X is affinity topology.
> > >>>>
> > >>>> Then we can say that node A is affinity node if
> > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
> > >>>>
> > >>>> It worth to mention that AT' should be used to pass to affinity
> > function
> > >>>> of
> > >>>> cache groups.
> > >>>> Also, AT and AT' could change during the time (BL(A)T changes or
> node
> > >>>> joins/disconnections).
> > >>>>
> > >>>> And I don't like fact that usage of DAT or SAT relies on persistence
> > >>>> settings (Should we make it configurable per cache group?).
> > >>>>
> > >>>> Ok, I have created a ticket to implement this changes and will start
> > >>>> working on it.
> > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
> > >>>> calculation doesn't take into account BLT).
> > >>>>
> > >>>> Also, I want to use these definitions (Affinity Topology, Affinity
> > Node,
> > >>>> DAT, SAT) in documentation and java docs.
> > >>>>
> > >>>> Maybe, we also should consider replacing BL(A)T with SAT.
> > >>>>
> > >>>> Thank you for your attention.
> > >>>>
> > >>>>
> >
> >
>

Reply via email to