good to see a discussion on this. This also has practical use for business continuity where you can control that the clients in a given data center first write replicas to its own data center, then to the other data center for backup. If I understand correctly, a write takes the token into account first, then the replication strategy decides where the replicas go. I would like to see the the first writes to be based on "location" instead of token - whether that is accomplished by manipulating the key or some other mechanism.
That way, if you do suffer the loss of a data center, the clients are guaranteed to meet quorum on the nodes in its own data center (given a mirrored architecture across 2 data centers). We have 2 data centers. If one goes down we have the problem that quorum cannot be satisfied for half of the reads. On Apr 6, 2011, at 6:00 AM, Jonathan Ellis wrote: > On Tue, Apr 5, 2011 at 10:45 PM, Yudong Gao <st...@umich.edu> wrote: >>> A better solution would be to just push the DecoratedKey into the >>> ReplicationStrategy so it can make its decision before information is >>> thrown away. >> >> I agree. So in this case, I guess the hashed based token ring is still >> preserved to avoid hot spot, but we further use the DecoratedKey to >> guide the replication strategy. For example, replica 2 is placed in >> the first node along the ring the belongs the desirable data center >> (based on the location hint embedded DecoratedKey). But we may not be >> able to control the primary replica. Do you think this will be a >> reasonable design? > > calculateNaturalEndpoints has complete freedom to generate all > replicas any way it likes. Thinking of an endpoint as "primary" > because it was generated first by one algorithm is dangerous. > > As one of the docstrings explains, replica destinations ("endpoints") > should be considered a Set even though we use a List for efficiency. > None of them are special at the ReplicationStrategy level. > >> Just curious, are they happy with the current >> solution with keyspace, and is there some requests for per-row >> placement control? > > Enough people want to try it that we have the ticket open. :) > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com