Hi Steve, Thank you so much for your kind reply and now it makes more sense. But for the remote coordinator issue, it’s definitely a interesting topic. If you have any other conclusion on this. I’d be pretty happy to learn from you.
Thanks again! Jun > On Jan 29, 2016, at 13:09, Steve Robenalt <sroben...@highwire.org> wrote: > > Hi Jun, > > The replicas are chosen according to factors that are generally more easily > selected internally, as is the case with coordinators. Even if the replicas > were selected in a completely round-robin fashion initially, they could end > up being re-distributed as a result of node failures, additions/removals > to/from the cluster, etc, particularly when vnodes are used. As such, the > diagrams and the nodes they refer to are hypothetical, but accurate in the > sense that they are non-contiguous, and that different sets of replicas are > distributed to various parts of the cluster. > > As far as the remote coordinator is concerned, I'm not sure what motivated > the change from 1.2 to 2.1 and would be interested in understanding that > change myself. I do know that improved performance was a big part of the 2.1 > release, but I'm not sure if the change in coordinators was part of that > effort or not. > > Steve > > > On Fri, Jan 29, 2016 at 10:13 AM, Jun Wu <wuxiaomi...@hotmail.com > <mailto:wuxiaomi...@hotmail.com>> wrote: > Hi Steve, > > Thank you so much for your reply. > > Yes, you're right, I'm using the version of 2.1. So based on this, I think > I'm outdated. > > However, this comes to another interesting question: why we change this > part from version 1 to version 2. As we can see that in version 1, there's > connections from node 10 in DC 1 with node 10 in DC 2, then node 10 in DC 2 > send 3 copies to 3 nodes in DC 2, which should be more time-saving than > version 2.1, which send data from node 10 in DC 1 to 3 nodes in DC 2 directly. > > Also, is there any information on how to choose the replicas. Like here > https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html> > Why we choose node 1, 3, 6 as replicas and 4, 8, 11 as another 3 replicas? > > Also, is node 11 working as remote coordinator here? Or is the concept of > remote coordinator really existed, as the figure shows, we even don't need > the remote coordinator. > > Thanks! > > Jun > > > > > Date: Fri, 29 Jan 2016 09:55:58 -0800 > Subject: Re: Questions about the replicas selection and remote coordinator > From: sroben...@highwire.org <mailto:sroben...@highwire.org> > To: user@cassandra.apache.org <mailto:user@cassandra.apache.org> > > > Hi Jun, > > The 2 diagrams you are comparing come from versions of Cassandra that are > significantly different - 1.2 in the first case and 2.1 in the second case, > so it's not surprising that there are differences. since you haven't > qualified your question with the Cassandra version you are asking about, I > would assume that the 2.1 example is more representative of what you would be > likely to see. In any case, it's best to use a consistent version for your > documentation because Cassandra changes quite rapidly with many of the > releases. > > As far as choosing the coordinator node, I don't think there's a way to force > it, nor would it be a good idea to do so. In order to make a reasonable > selection of coordinators, you would need a lot of internal knowledge about > load on the nodes in the cluster and you'd need to also handle certain > classes of failures and retries, so you would end up duplicating what is > already being done for you internally. > > Steve > > > On Fri, Jan 29, 2016 at 9:11 AM, Jun Wu <wuxiaomi...@hotmail.com > <mailto:wuxiaomi...@hotmail.com>> wrote: > Hi there, > > I have some questions about the replicas selection. > > Let's say that we have 2 data centers: DC1 and DC2, the figure also be > got from link here: > https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png > > <https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png>. > There're 10 nodes in each data center. We set the replication factor to be 3 > and 3 in each data center, which means there'll be 3 and 3 replicas in each > data center. > > (1) My first question is how to choose which 3 nodes to write data to, in > the link above, the 3 replicas are node 1, 2, 7. But, is there any mechanism > to select these 3? > > (2) Another question is about the remote coordinator, the previous figure > shows that node 10 in DC1 will write data to node 10 in DC 2, then node 10 > in DC2 will write 3 copies to 3 nodes in DC2. > > But, another figure from datastax shows different method, the figure can > be found here, > https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html>. > It shows that node 10 in DC 1 will send directly 3 copies to 3 nodes in DC2, > without using remote coordinator. > > I'm wondering which case is true, because in multiple data center, the > time duration for these two methods varies a lot. > > Also, is there any mechanism to select which node to be remote > coordinator? > > Thanks! > > Jun > > > > -- > Steve Robenalt > Software Architect > sroben...@highwire.org <mailto:bza...@highwire.org> > (office/cell): 916-505-1785 <tel:916-505-1785> > > HighWire Press, Inc. > 425 Broadway St, Redwood City, CA 94063 > www.highwire.org <http://www.highwire.org/> > > Technology for Scholarly Communication > > > > -- > Steve Robenalt > Software Architect > sroben...@highwire.org <mailto:bza...@highwire.org> > (office/cell): 916-505-1785 > > HighWire Press, Inc. > 425 Broadway St, Redwood City, CA 94063 > www.highwire.org <http://www.highwire.org/> > > Technology for Scholarly Communication