I second Peters point, big servers are not always the best.
My experience (using spinning disks) is that 200 to 300 GB of live data load
per node (including replicated data) is a sweet spot. Above this the time taken
for compaction, repair, off node backups, node moves etc starts to be a pain.
Good point. One thing I'm wondering about cassandra is what happens when
there is a massive failure. For example, if 1/3 of the nodes go down or
become unreachable. This could happen in EC2 if an AZ has a failure, or
in a datacenter if a whole rack or UPS goes dark. I'm not so concerned
about the t
> Thanks for the responses! We'll definitely go for powerful servers to
> reduce the total count. Beyond a dozen servers there really doesn't seem
> to be much point in trying to increase count anymore for
Just be aware that if "big" servers imply *lots* of data (especially
in relation to memory s
Thanks for the responses! We'll definitely go for powerful servers to
reduce the total count. Beyond a dozen servers there really doesn't seem
to be much point in trying to increase count anymore for
replication/redundancy. I'm assuming we will use level compaction, which
means that we'll most like
I'd also add that one of the biggest complications to arise from having
multiple clusters is that read biased client applications would need to be
aware of all clusters and either aggregate result sets or involve logic to
choose the right cluster based on a particular query.
And from a more operat
You can also scale not "horizontally" but "diagonally",
i.e. raid SSDs and have multicore CPUs. This means that
you'll have same performance with less nodes, making
it far easier to manage.
SSDs by themselves will give you an order of magnitude
improvement on I/O.
On 1/19/2012 9:17 PM, Thorsten
> We're embarking on a project where we estimate we will need on the order
> of 100 cassandra nodes. The data set is perfectly partitionable, meaning
> we have no queries that need to have access to all the data at once. We
> expect to run with RF=2 or =3. Is there some notion of ideal cluster
> si