Re: hardware sizing for cassandra

James Briggs Tue, 09 Sep 2014 18:10:36 -0700

Regarding what Netflix does, the last time I checked:

1) sure, they use AWS VMs, but they take the whole machine.
So is that really using a VM? :)

2) they use SSD mainly to reduce compaction time. "We don't
even notice it with SSD any more."

When sizing nodes and clusters, the main factors I've seen are:

a) What read latency are you trying to achieve? With 400 GB data per node,
10 ms is easy, but 1 ms is hard. Your whole design will revolve around this
if you want low latency.

b) How much data load per node is there? Bootstrapping and backup/restore
gets time-consuming and hard with more than 400 GB per node.

c) Are you planning to delete data? If so, that's harder to manage.

Other than that, the previous comments on RAM are pretty accurate.
I would want more cores with vnodes to do more parallel operations.

Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in San Jose area or remote.

________________________________
 From: Robert Coli <rc...@eventbrite.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org> 
Sent: Tuesday, September 9, 2014 2:44 PM
Subject: Re: hardware sizing for cassandra

On Tue, Sep 9, 2014 at 2:16 PM, Russell Bradberry <rbradbe...@gmail.com> wrote:

Because RAM is expensive and the JVM heap is limited to 8gb. While you do get 
benefit out of using extra RAM as page cache, it's often not cost efficient to 
do so
>
>
>Again, this is so use-case dependent. I have met several people that run small 
>nodes with fat ram to get it all in memory to serve things in as few 
>milliseconds as possible.  This is a very common pattern in ad-tech where 
>every millisecond counts.  The tunable consistency and cross-datacenter 
>replication make Cassandra very appealing as it is difficult to set this up 
>with other DBs. 

Sure, it's also very common to run RDBMS in such a mode that hundreds of 
gigabytes of RAM are available as either page cache or buffer pool. But "things 
are fast when you don't access slow disks" is not really a commentary on 
Cassandra specifically, "8gb is the largest practical heap size with CMS GC" 
is.. :D

The recommended setup is 3 nodes and an RF of 3 to be able to make quorum 
reads/writes and survive an outage. But again, this is completely use-case 
dependent.

IMO, minimum number of nodes you actually want to use in production with RF=3 
is >=4, probably closer to 6. But as you say, use case dependent.

=Rob

Re: hardware sizing for cassandra

Reply via email to