Cassandra's Multiple Data Center Support is meant for replicating all data across multiple datacenter's efficiently.
You could use the Byte Order Partitioner to prefix data with a key and assign those keys to nodes in specific data centers, though the edge nodes would get tricky as those would want to have replicas in other data centers, you could probably do some stuff with sentinel values, and some nodes that only replicate data and aren't the primary node for any data to make this not happen. It is doable, though this would probably be more trouble then it is worth. I would probably just make each DC its own cluster and have client logic which knows which DC to query. -Jeremiah On Nov 22, 2011, at 6:57 PM, Mathieu Lalonde wrote: > > > Hi, > > I am wondering if Cassandra's features and datacenter awareness can help me > with my scalability problems. > > Suppose that I have a 10-20 Data centers, each with their own local (massive) > source of time series data. I would like: > - to avoid replication across data centers (this seems doable based on: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Different-KeySpaces-for-different-nodes-in-the-same-ring-td5096393.html#a5096568 > ) > - writes for local data to be done on the local data center (not sure about > that one) > - reads from a master data center to any remote data centers (not sure about > that one either) > > It sounds like I am trying to use Cassandra in a very different way that it > was intended to be used. > Should I simply have a middle-tier that takes care of distributing reads to > multiple data centers and treat each data center as its own autonomous > cluster? > > Thanks! > Matt > >