These are both good suggestions, thanks!
I thought I had remembered reading that different virtual datacenters
should always have the same number of nodes. I think I was mistaken about
that. In the past we had major issues running huge analytics jobs on data
stored in HBase (it would bring down
I'm not sure if this is a good use case for you, but you might also
consider setting up several keyspaces, one for any data you want available
for analytics (such as business object tables), and one for data you don't
want to do analytics on (such as custom secondary indices). Maybe a third
one fo
"Cassandra would take care of keeping the data synced between the two sets
of five nodes. Is that correct?"
Correct
"But doing so means that we need 2x as many nodes as we need for the
real-time cluster alone"
Not necessarily. With multi DC you can configure the replication factor
value per DC,
Hi all,
I read the DSE 4.6 documentation and I'm still not 100% sure what a mixed
workload Cassandra + Spark installation would look like, especially on
AWS. What I gather is that you use OpsCenter to set up the following:
- One "virtual data center" for real-time processing (e.g., ingestion