Re: Right sizing Cassandra data nodes

Charulata Sharma (charshar) Mon, 19 Feb 2018 14:55:53 -0800

Thanks for the response Rahul. I did not understand the “node density” point.

Charu

From: Rahul Singh <rahul.xavier.si...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, February 19, 2018 at 12:32 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Right sizing Cassandra data nodes

1. I would keep opscenter on different cluster. Why unnecessarily put traffic 
and computing for opscenter data on a real business data cluster?
2. Don’t put more than 1-2 TB per node. Maybe 3TB. Node density as it increases 
creates more replication, read repairs , etc and memory usage for doing the 
compactions etc.
3. Can have as much as you want for snapshots as long as you have it on another 
disk or even move it to a SAN / NAS. All you may care about us the most recent 
snapshot on the physical machine / disks on a live node.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 19, 2018, 3:08 PM -0500, Charulata Sharma (charshar) 
<chars...@cisco.com>, wrote:

Hi All,

Looking for some insight into how application data archive and purge is carried 
out for C* database. Are there standard guidelines on calculating the amount of 
space that can be used for storing data in a specific node.

Some pointers that I got while researching are;

-          Allocate 50% space for compaction, e.g. if data size is 50GB then 
allocate 25GB for compaction.

-          Snapshot strategy. If old snapshots are present, then they occupy 
the disk space.

-          Allocate some percentage of storage ( ???? ) for system tables and 
OpsCenter tables ?

We have a scenario where certain transaction data needs to be archived based on 
business rules and some purged, so before deciding on an A&P strategy, I am 
trying to analyze
how much transactional data can be stored given the current node capacity. I 
also found out that the space available metric shown in Opscenter is not very 
reliable because it doesn’t show
the snapshot space. In our case, we have a huge snapshot size. For some 
unexplained reason, we seem to be taking snapshots of our data every hour and 
purging them only after 7 days.

Thanks,
Charu
Cisco Systems.

Re: Right sizing Cassandra data nodes

Reply via email to