RE: Questions about the replicas selection and remote coordinator

Jun Wu Fri, 29 Jan 2016 10:14:40 -0800

Hi Steve,
   Thank you so much for your reply. 
   Yes, you're right, I'm using the version of 2.1. So based on this, I think 
I'm outdated. 
    However, this comes to another interesting question: why we change this 
part from version 1 to version 2. As we can see that in version 1, there's 
connections from node 10 in DC 1 with node 10 in DC 2, then node 10 in DC 2 
send 3 copies to 3 nodes in DC 2, which should be more time-saving than version 
2.1, which send data from node 10 in DC 1 to 3 nodes in DC 2 directly.
     Also, is there any information on how to choose the replicas. Like here 
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html
    Why we choose node 1, 3, 6 as replicas and 4, 8, 11 as another 3 replicas?
    Also, is node 11 working as remote coordinator here? Or is the concept of 
remote coordinator really existed, as the figure shows, we even don't need the 
remote coordinator. 
    Thanks!
Jun
        
Date: Fri, 29 Jan 2016 09:55:58 -0800
Subject: Re: Questions about the replicas selection and remote coordinator
From: sroben...@highwire.org
To: user@cassandra.apache.org

Hi Jun,
The 2 diagrams you are comparing come from versions of Cassandra that are
significantly different - 1.2 in the first case and 2.1 in the second case, so
it's not surprising that there are differences. since you haven't qualified
your question with the Cassandra version you are asking about, I would assume
that the 2.1 example is more representative of what you would be likely to see.
In any case, it's best to use a consistent version for your documentation
because Cassandra changes quite rapidly with many of the releases.
As far as choosing the coordinator node, I don't think there's a way to force
it, nor would it be a good idea to do so. In order to make a reasonable
selection of coordinators, you would need a lot of internal knowledge about
load on the nodes in the cluster and you'd need to also handle certain classes
of failures and retries, so you would end up duplicating what is already being
done for you internally.
Steve

On Fri, Jan 29, 2016 at 9:11 AM, Jun Wu <wuxiaomi...@hotmail.com> wrote:

Hi there,
I have some questions about the replicas selection.
Let's say that we have 2 data centers: DC1 and DC2, the figure also be got
from link here:
https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png.
There're 10 nodes in each data center. We set the replication factor to be 3
and 3 in each data center, which means there'll be 3 and 3 replicas in each
data center.
(1) My first question is how to choose which 3 nodes to write data to, in
the link above, the 3 replicas are node 1, 2, 7. But, is there any mechanism to
select these 3?
(2) Another question is about the remote coordinator, the previous figure
shows that node 10 in DC1 will write data to node 10 in DC 2, then node 10 in
DC2 will write 3 copies to 3 nodes in DC2.
But, another figure from datastax shows different method, the figure can be
found here,
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html.
It shows that node 10 in DC 1 will send directly 3 copies to 3 nodes in DC2,
without using remote coordinator.
I'm wondering which case is true, because in multiple data center, the time
duration for these two methods varies a lot.
Also, is there any mechanism to select which node to be remote coordinator?
Thanks!
Jun

--
Steve Robenalt Software architectsroben...@highwire.org (office/cell):
916-505-1785
HighWire Press, Inc.425 Broadway St, Redwood City, CA 94063www.highwire.org
Technology for Scholarly Communication

RE: Questions about the replicas selection and remote coordinator

Reply via email to