Hi,

We currently run our Cassandra deployment with
multiple independent clusters.  The clusters are totally self contain in
terms of redundancy and independent from each others.  We have a "sharding
"layer higher in our stack to dispatch the requests to the right
application stack and this stack connects to his associated Cassandra
cluster. All the cassandra clusters are identical in terms of hosted
keyspaces, column families, replication factor ...

At this point I am investigating ways to build a central cassandra cluster
that could contain all the data from all the other cassandra clusters and I
am wondering how to best do it.  The goal is to have a global view of our
data and to be able to do some massive crunching on it.

For sure we can build some ETL type of job that would figure out the data
that was updated, extract it, and load it to the central cassandra cluster.
 From this mailing list I found this Github project that is doing something
similar by looking at the commit logs:
https://github.com/carloscm/cassandra-commitlog-extract

But is there other options around using a custom replication strategy?  Any
other general suggestions ?

Thanks,

FR

-- 

_____________________________________________

*Francois Richard *

Reply via email to