See the "Edward Capriolo (media6degrees) – Real World Capacity Planning:
Cassandra on Blades and Big Iron" at
http://www.datastax.com/events/cassandrasf2011/presentations
Open ended questions like this are really hard to answer. It's a lot easier for
people if you provide som
Can you guys please explain how to do capacity planning in cassandra
So, in our experience, the amount of storage overhead is much higher. If you
plan on storing 120TB of data, you will want to expect storing 250 TB of
data on disk after the data over head. And then since you have to leave 50%
of storage space free for compaction, you're looking at needing about 500
On Wed, Jun 29, 2011 at 5:36 AM, Jacob, Arun wrote:
> if I'm planning to store 20TB of new data per week, and expire all data
> every 2 weeks, with a replication factor of 3, do I only need approximately
> 120 TB of disk? I'm going to use ttl in my column values to automatically
> expire data. Or
if I'm planning to store 20TB of new data per week, and expire all data every 2
weeks, with a replication factor of 3, do I only need approximately 120 TB of
disk? I'm going to use ttl in my column values to automatically expire data. Or
would I need more capacity to handle sstable merges? Given
get redistributed, and then clean off the decommissioned
node and bootstrap it. Since the disks are too full for an anticompaction,
you can't move the token on that node.
Given this, I wonder about the right approach to capacity planning. If I
want to store, say, 500M rows, and I know bas