Re: Running Cassandra + Spark on AWS - architecture questions

2015-02-23 Thread Clint Kelly
These are both good suggestions, thanks! I thought I had remembered reading that different virtual datacenters should always have the same number of nodes. I think I was mistaken about that. In the past we had major issues running huge analytics jobs on data stored in HBase (it would bring down

Re: Running Cassandra + Spark on AWS - architecture questions

2015-02-22 Thread Eric Stevens
I'm not sure if this is a good use case for you, but you might also consider setting up several keyspaces, one for any data you want available for analytics (such as business object tables), and one for data you don't want to do analytics on (such as custom secondary indices). Maybe a third one fo

Re: Running Cassandra + Spark on AWS - architecture questions

2015-02-20 Thread DuyHai Doan
"Cassandra would take care of keeping the data synced between the two sets of five nodes. Is that correct?" Correct "But doing so means that we need 2x as many nodes as we need for the real-time cluster alone" Not necessarily. With multi DC you can configure the replication factor value per DC,

Running Cassandra + Spark on AWS - architecture questions

2015-02-20 Thread Clint Kelly
Hi all, I read the DSE 4.6 documentation and I'm still not 100% sure what a mixed workload Cassandra + Spark installation would look like, especially on AWS. What I gather is that you use OpsCenter to set up the following: - One "virtual data center" for real-time processing (e.g., ingestion