Re: DataCenters each with their own local data source

Jeremiah Jordan Tue, 22 Nov 2011 18:49:40 -0800

Cassandra's Multiple Data Center Support is meant for replicating all data 
across multiple datacenter's efficiently.

You could use the Byte Order Partitioner to prefix data with a key and assign 
those keys to nodes in specific data centers, though the edge nodes would get 
tricky as those would want to have replicas in other data centers, you could 
probably do some stuff with sentinel values, and some nodes that only replicate 
data and aren't the primary node for any data to make this not happen.

It is doable, though this would probably be more trouble then it is worth.  I 
would probably just make each DC its own cluster and have client logic which 
knows which DC to query.

-Jeremiah

On Nov 22, 2011, at 6:57 PM, Mathieu Lalonde wrote:

> 
> 
> Hi,
> 
> I am wondering if Cassandra's features and datacenter awareness can help me 
> with my scalability problems.
> 
> Suppose that I have a 10-20 Data centers, each with their own local (massive) 
> source of time series data.  I would like:
> - to avoid replication across data centers (this seems doable based on: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Different-KeySpaces-for-different-nodes-in-the-same-ring-td5096393.html#a5096568
>  )
> - writes for local data to be done on the local data center (not sure about 
> that one)
> - reads from a master data center to any remote data centers (not sure about 
> that one either)
> 
> It sounds like I am trying to use Cassandra in a very different way that it 
> was intended to be used.
> Should I simply have a middle-tier that takes care of distributing reads to 
> multiple data centers and treat each data center as its own autonomous 
> cluster?
> 
> Thanks!
> Matt
> 
>

Re: DataCenters each with their own local data source

Reply via email to