Re: Location-aware replication based on objects' access pattern

Yudong Gao Tue, 05 Apr 2011 18:38:09 -0700

Thanks for the reply, Jonathan!

This per-row control is exactly what I need. I will be happy to help
tackle it in the long term. Is there some further information or plan
for this issues?


One thing I am worrying about is how to maintain the location
information for each row. The current partitioner maps a key to MD5
hash, and it is almost impossible to control the hashed token by
manipulating the value of the key. Also, maintaining a key-to-location
mapping would be unscalable. My initial thought is to use the key
string as the token directly, so that the location information can be
binded into the key. This minimize the changes to the other
components.

Another problem for me is that currently we have a deadline coming
soon, so we need to get something up and running soon. It does not
need to perfect or general, and some quick tricks will be sufficient.
Do you know how the existing application is achieving this without the
per-row support?

Thanks!

Yudong

On Tue, Apr 5, 2011 at 6:39 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
> You'd really want https://issues.apache.org/jira/browse/CASSANDRA-2369
> to control per-row. Let me know if you'd like to help tackle that.
>
> On Tue, Apr 5, 2011 at 5:05 PM, Yudong Gao <st...@umich.edu> wrote:
> >
> > Hi,
> >
> > I am thinking about using Cassandra for our research project, and we
> > are thinking about one interesting feature.
> >
> > Our setup has multiple datacenters located in different geography
> > locations. Data is accessed with predictable patterns. Think of
> > something like Craigslist, data objects corresponding to CA will
> > mostly accessed by users from the west cost. If this case, if all the
> > replicas are stored in the east coast, the access would not be
> > efficient. Other applications such as Facebook, should also have
> > similar concern.
> >
> > I am aware of the placement strategies such as
> > RackAwareStrategy/NetworkTopologyStrategy. But they place objects
> > based on their hashed token, but not they access pattern. I am
> > thinking about one possible trick, which is to manipulate the key of
> > the object based on its access pattern, so that the key can be mapped
> > to a token that will have at least one replica (ideally the primary
> > replica) stored in the desired data center, and the other replicas
> > stored in other data centers for reliability concern.
> >
> > I found this post discussing a similar problem,
> >
> > http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html
> >
> > but Ben suggested just writing one new replication strategy. IMO, this
> > location-aware replication should be one common problem for Cassandra,
> > especially since it has been widely used in many large-scale
> > commercial applications such as Facebook and Twitter. I am interested
> > in how they handle this problem.
> >
> > Is there any existing solution that I refer to and get start with?
> >
> > Thanks!
> >
> > Yudong
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Location-aware replication based on objects' access pattern

Reply via email to