Hi,

I am thinking about using Cassandra for our research project, and we
are thinking about one interesting feature.

Our setup has multiple datacenters located in different geography
locations. Data is accessed with predictable patterns. Think of
something like Craigslist, data objects corresponding to CA will
mostly accessed by users from the west cost. If this case, if all the
replicas are stored in the east coast, the access would not be
efficient. Other applications such as Facebook, should also have
similar concern.

I am aware of the placement strategies such as
RackAwareStrategy/NetworkTopologyStrategy. But they place objects
based on their hashed token, but not they access pattern. I am
thinking about one possible trick, which is to manipulate the key of
the object based on its access pattern, so that the key can be mapped
to a token that will have at least one replica (ideally the primary
replica) stored in the desired data center, and the other replicas
stored in other data centers for reliability concern.

I found this post discussing a similar problem,

http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html

but Ben suggested just writing one new replication strategy. IMO, this
location-aware replication should be one common problem for Cassandra,
especially since it has been widely used in many large-scale
commercial applications such as Facebook and Twitter. I am interested
in how they handle this problem.

Is there any existing solution that I refer to and get start with?

Thanks!

Yudong

Reply via email to