Thanks for the reply, Jonathan! This per-row control is exactly what I need. I will be happy to help tackle it in the long term. Is there some further information or plan for this issues?
One thing I am worrying about is how to maintain the location information for each row. The current partitioner maps a key to MD5 hash, and it is almost impossible to control the hashed token by manipulating the value of the key. Also, maintaining a key-to-location mapping would be unscalable. My initial thought is to use the key string as the token directly, so that the location information can be binded into the key. This minimize the changes to the other components. Another problem for me is that currently we have a deadline coming soon, so we need to get something up and running soon. It does not need to perfect or general, and some quick tricks will be sufficient. Do you know how the existing application is achieving this without the per-row support? Thanks! Yudong On Tue, Apr 5, 2011 at 6:39 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > > You'd really want https://issues.apache.org/jira/browse/CASSANDRA-2369 > to control per-row. Let me know if you'd like to help tackle that. > > On Tue, Apr 5, 2011 at 5:05 PM, Yudong Gao <st...@umich.edu> wrote: > > > > Hi, > > > > I am thinking about using Cassandra for our research project, and we > > are thinking about one interesting feature. > > > > Our setup has multiple datacenters located in different geography > > locations. Data is accessed with predictable patterns. Think of > > something like Craigslist, data objects corresponding to CA will > > mostly accessed by users from the west cost. If this case, if all the > > replicas are stored in the east coast, the access would not be > > efficient. Other applications such as Facebook, should also have > > similar concern. > > > > I am aware of the placement strategies such as > > RackAwareStrategy/NetworkTopologyStrategy. But they place objects > > based on their hashed token, but not they access pattern. I am > > thinking about one possible trick, which is to manipulate the key of > > the object based on its access pattern, so that the key can be mapped > > to a token that will have at least one replica (ideally the primary > > replica) stored in the desired data center, and the other replicas > > stored in other data centers for reliability concern. > > > > I found this post discussing a similar problem, > > > > http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html > > > > but Ben suggested just writing one new replication strategy. IMO, this > > location-aware replication should be one common problem for Cassandra, > > especially since it has been widely used in many large-scale > > commercial applications such as Facebook and Twitter. I am interested > > in how they handle this problem. > > > > Is there any existing solution that I refer to and get start with? > > > > Thanks! > > > > Yudong > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com