On Tue, Sep 6, 2011 at 3:53 PM, Hefeng Yuan <hfy...@rhapsody.com> wrote:
> Hi, > > Is there any suggested way of calculating number of nodes needed based on > data? > We currently have 6 nodes (each has 8G memory) with RF5 (because we want to > be able to survive loss of 2 nodes). > The flush of memtable happens around every 30 min (while not doing > compaction), with ~9m serialized bytes. > > The problem is that we see more than 3 nodes doing compaction at the same > time, which slows down the application. > (tried to increase/decrease compaction_throughput_mb_per_sec, not helping > much) > > So I'm thinking probably we should add more nodes, but not sure how many > more to add. > Based on the data rate, is there any suggested way of calculating number of > nodes required? > > Thanks, > Hefeng What is the total amount of data? What is the total amount in the biggest column family? There is no hard limit per node. Cassandra gurus like more nodes :-). One number for 'happy cassandra users' I have seen mentioned in discussions is around 250-300 GB per node. But you could store more per node by having multiple column families each storing around 250-300 GB per column family. The main problem being repair/compactions and such operations taking longer and requiring much more spare disk space. As for slow down in application during compaction I was wondering what is the CL you are using for read and writes? Make sure it is not a client issue - Is your client hitting all nodes in round-robin or some other fashion? -Adi