I think that this article [1] covers most of the concepts (see key
concepts) quite well.
I am not aware of any article that explains the whole process, though.

Briefly, there are several processes/concepts that are somewhat related to
that subject: token ownership, replica, coordinator and gossip.
Ensuring consistency in small cluster (amount of replica <= amount of
nodes) is more or less straightforward. In this case, when node bootstraps,
it notifies all the replicas, information about that node gets added to
`pending nodes`, all nodes know about the bootstrapping node, as otherwise
streaming would not even start.
Having a coordinator outside of replica for the partition/token you're
querying is a bit more complex, as it involves the knowledge about the
joined node that's distributed over gossip.

There are two properties that can improve the situation with range
movements: cassandra.consistent.rangemovement
and cassandra.consistent.simultaneousmoves.allow. First one disallows ring
changes in case there's any node in replica is offline. In addition to
that, it makes sure there are no moves within the ring. In that case, if
you're connected to coordinator that's a part of replica, data has to be
placed correctly. The data will be moved and any inconsistencies will be
eventually fixed with a repair (answering your question, there will be no
data lost during this process).

(I tried to provide information according to my best knowledge, although if
anyone sees something wrong, please indicate accordingly)

[1] https://dzone.com/articles/introduction-apache-cassandra

On Thu, May 19, 2016 at 5:58 AM Renjie Liu <liurenjie2...@gmail.com> wrote:

> BTW, is there any article explaining the process? I think this will help us
> understand it better.
>
> On Thu, May 19, 2016 at 11:28 AM Renjie Liu <liurenjie2...@gmail.com>
> wrote:
>
> > Thanks, I'll read the code.
> >
> > On Thu, May 19, 2016 at 11:02 AM Jeff Jirsa <jeff.ji...@crowdstrike.com>
> > wrote:
> >
> >>
> >>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L731-L754
> >>
> >>
> >> And
> >>
> >>
> >>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L60-L88
> >>
> >>
> >>
> >> Cassandra keeps a map of joining and leaving nodes, and does extra
> writes
> >> to the appropriate nodes for mutations created after the streaming is
> >> calculated.
> >>
> >>
> >>
> >> On 5/18/16, 7:33 PM, "Renjie Liu" <liurenjie2...@gmail.com> wrote:
> >>
> >> >Hi, cassandra devs:
> >> >I'm learning cassandra and I can understand most of the techniques
> used.
> >> >But I can't understand how cassandra ensures consistency when
> >> >adding/removing a node? It seems that when a node joins the dht ring,
> >> some
> >> >node need to transferring data to the new node using streaming. But the
> >> >data may still get updated while transferring, so the new node can
> never
> >> >catch up with it. How cassandra handles this? Will cassandra lose data
> >> >during this process?
> >> >--
> >> >Liu, Renjie
> >> >Software Engineer, MVAD
> >
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
> --
> Liu, Renjie
> Software Engineer, MVAD
>
-- 
Alex Petrov

Reply via email to