Hi, I am thinking about using Cassandra for our research project, and we are thinking about one interesting feature.
Our setup has multiple datacenters located in different geography locations. Data is accessed with predictable patterns. Think of something like Craigslist, data objects corresponding to CA will mostly accessed by users from the west cost. If this case, if all the replicas are stored in the east coast, the access would not be efficient. Other applications such as Facebook, should also have similar concern. I am aware of the placement strategies such as RackAwareStrategy/NetworkTopologyStrategy. But they place objects based on their hashed token, but not they access pattern. I am thinking about one possible trick, which is to manipulate the key of the object based on its access pattern, so that the key can be mapped to a token that will have at least one replica (ideally the primary replica) stored in the desired data center, and the other replicas stored in other data centers for reliability concern. I found this post discussing a similar problem, http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html but Ben suggested just writing one new replication strategy. IMO, this location-aware replication should be one common problem for Cassandra, especially since it has been widely used in many large-scale commercial applications such as Facebook and Twitter. I am interested in how they handle this problem. Is there any existing solution that I refer to and get start with? Thanks! Yudong