Re: ideal cluster size

2012-01-23 Thread aaron morton
I second Peters point, big servers are not always the best. My experience (using spinning disks) is that 200 to 300 GB of live data load per node (including replicated data) is a sweet spot. Above this the time taken for compaction, repair, off node backups, node moves etc starts to be a pain.

Re: ideal cluster size

2012-01-21 Thread Thorsten von Eicken
Good point. One thing I'm wondering about cassandra is what happens when there is a massive failure. For example, if 1/3 of the nodes go down or become unreachable. This could happen in EC2 if an AZ has a failure, or in a datacenter if a whole rack or UPS goes dark. I'm not so concerned about the t

Re: ideal cluster size

2012-01-21 Thread Peter Schuller
> Thanks for the responses! We'll definitely go for powerful servers to > reduce the total count. Beyond a dozen servers there really doesn't seem > to be much point in trying to increase count anymore for Just be aware that if "big" servers imply *lots* of data (especially in relation to memory s

Re: ideal cluster size

2012-01-21 Thread Thorsten von Eicken
Thanks for the responses! We'll definitely go for powerful servers to reduce the total count. Beyond a dozen servers there really doesn't seem to be much point in trying to increase count anymore for replication/redundancy. I'm assuming we will use level compaction, which means that we'll most like

Re: ideal cluster size

2012-01-21 Thread Eric Czech
I'd also add that one of the biggest complications to arise from having multiple clusters is that read biased client applications would need to be aware of all clusters and either aggregate result sets or involve logic to choose the right cluster based on a particular query. And from a more operat

Re: ideal cluster size

2012-01-20 Thread Maxim Potekhin
You can also scale not "horizontally" but "diagonally", i.e. raid SSDs and have multicore CPUs. This means that you'll have same performance with less nodes, making it far easier to manage. SSDs by themselves will give you an order of magnitude improvement on I/O. On 1/19/2012 9:17 PM, Thorsten

Re: ideal cluster size

2012-01-19 Thread Peter Schuller
> We're embarking on a project where we estimate we will need on the order > of 100 cassandra nodes. The data set is perfectly partitionable, meaning > we have no queries that need to have access to all the data at once. We > expect to run with RF=2 or =3. Is there some notion of ideal cluster > si