On Wed, Apr 6, 2011 at 3:55 AM, Sasha Dolgy <sdo...@gmail.com> wrote: > I had been asked this question from a strategy point of view, and > referenced how linkedin.com appears to handle this. > > <assumption> > Specific region data is stored on a ring in that region. While based > in the middle east, my linkedin.com profile was kept on the middle > east part of linkedin.com ... when I moved back to europe, updated my > city, my profile shifted from the middle east to europe ... > </assumption> > > would it not be easier to manage multiple rings (one in each required > geographic region) to suit the location aware use case? This way you > can grow out that region as necessary and invest less into the regions > that aren't as busy ... > > would mean your application needs to be aware of the different regions > and where data exists ... or make some initial assumptions as to where > to find data ... > > - 1 ring for apac > - 1 ring for europe > - 1 ring for americas > - 1 global ring (with nodes present in each region) > > the global ring maintains reference data on which ring a guid exists ... > > I've been playing with this concept on AWS ... the amount of data I > have isn't significant, so I may not have run into problems that will > occur when i get to large amounts of data ... >
This is interesting. But how do you design the global ring to make sure that it is not the bottleneck? For example, if a client need to access data in the US ring, but she need to first talk to a europe node to get the reference data, this will not be efficient. Another potential problem is that the data is not synchronized among the rings. If one data center goes down, the data stored there will get lost. One way to get around may be to use the NetworkTopologyStrategy. For example, with RF=3, for the ring in europe, we can specify 2 replicas in europe and 1 replica in america. Thanks! Yudong > -sd > > On Wed, Apr 6, 2011 at 9:26 AM, Jonathan Colby <jonathan.co...@gmail.com> > wrote: >> good to see a discussion on this. >> >> This also has practical use for business continuity where you can control >> that the clients in a given data center first write replicas to its own data >> center, then to the other data center for backup. If I understand >> correctly, a write takes the token into account first, then the replication >> strategy decides where the replicas go. I would like to see the the first >> writes to be based on "location" instead of token - whether that is >> accomplished by manipulating the key or some other mechanism. >> >> That way, if you do suffer the loss of a data center, the clients are >> guaranteed to meet quorum on the nodes in its own data center (given a >> mirrored architecture across 2 data centers). >> >> We have 2 data centers. If one goes down we have the problem that quorum >> cannot be satisfied for half of the reads. >