Thanks for the reply, Razi.

As I mentioned earlier, we're not currently using snapshots - it's only the
backups that are bothering me right now.

So my next question is pertaining to this statement of yours:

As far as I am aware, using *rm* is perfectly safe to delete the
> directories for snapshots/backups as long as you are careful not to delete
> your actively used sstable files and directories.


How do I find out which are the actively used sstables?
If by that you mean the main data files, does that mean I can safely remove
all files ONLY under the "backups/" directory?
Or, removing any files that are current hard-links inside backups can
potentially cause any issues?

Thanks,
Kunal

On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
raziuddin.kh...@nih.gov> wrote:

> Hello Kunal,
>
>
>
> I would take a look at the following configuration options in the
> Cassandra.yaml
>
>
>
> *Common automatic backup settings*
>
> *Incremental_backups:*
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__
> incremental_backups
>
>
>
> (Default: false) Backs up data updated since the last snapshot was taken.
> When enabled, Cassandra creates a hard link to each SSTable flushed or
> streamed locally in a backups subdirectory of the keyspace data. Removing
> these links is the operator's responsibility.
>
>
>
> *snapshot_before_compaction*:
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__
> snapshot_before_compaction
>
>
>
> (Default: false) Enables or disables taking a snapshot before each
> compaction. A snapshot is useful to back up data when there is a data
> format change. Be careful using this option: Cassandra does not clean up
> older snapshots automatically.
>
>
>
>
>
> *Advanced automatic backup setting*
>
> *auto_snapshot*:
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#
> configCassandra_yaml__auto_snapshot
>
>
>
> (Default: true) Enables or disables whether Cassandra takes a snapshot of
> the data before truncating a keyspace or dropping a table. To prevent data
> loss, Datastax strongly advises using the default setting. If you
> set auto_snapshot to false, you lose data on truncation or drop.
>
>
>
>
>
> *nodetool* also provides methods to manage snapshots.
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/tools/toolsNodetool.html
>
> See the specific commands:
>
>    - nodetool clearsnapshot
>    
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsClearSnapShot.html>
>    Removes one or more snapshots.
>    - nodetool listsnapshots
>    
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsListSnapShots.html>
>    Lists snapshot names, size on disk, and true size.
>    - nodetool snapshot
>    
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsSnapShot.html>
>    Take a snapshot of one or more keyspaces, or of a table, to backup
>    data.
>
>
>
> As far as I am aware, using *rm* is perfectly safe to delete the
> directories for snapshots/backups as long as you are careful not to delete
> your actively used sstable files and directories.  I think the *nodetool
> clearsnapshot* command is provided so that you don’t accidentally delete
> actively used files.  Last I used *clearsnapshot*, (a very long time
> ago), I thought it left behind the directory, but this could have been
> fixed in newer versions (so you might want to check that).
>
>
>
> HTH
>
> -Razi
>
>
>
>
>
> *From: *Jonathan Haddad <j...@jonhaddad.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Tuesday, January 10, 2017 at 12:26 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Backups eating up disk space
>
>
>
> If you remove the files from the backup directory, you would not have data
> loss in the case of a node going down.  They're hard links to the same
> files that are in your data directory, and are created when an sstable is
> written to disk.  At the time, they take up (almost) no space, so they
> aren't a big deal, but when the sstable gets compacted, they stick around,
> so they end up not freeing space up.
>
>
>
> Usually you use incremental backups as a means of moving the sstables off
> the node to a backup location.  If you're not doing anything with them,
> they're just wasting space and you should disable incremental backups.
>
>
>
> Some people take snapshots then rely on incremental backups.  Others use
> the tablesnap utility which does sort of the same thing.
>
>
>
> On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar <
> kgangakhed...@gmail.com> wrote:
>
> Thanks for quick reply, Jon.
>
>
>
> But, what about in case of node/cluster going down? Would there be data
> loss if I remove these files manually?
>
>
>
> How is it typically managed in production setups?
>
> What are the best-practices for the same?
>
> Do people take snapshots on each node before removing the backups?
>
>
>
> This is my first production deployment - so, still trying to learn.
>
>
>
> Thanks,
>
> Kunal
>
>
>
> On 10 January 2017 at 21:36, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
> You can just delete them off the filesystem (rm)
>
>
>
> On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar <
> kgangakhed...@gmail.com> wrote:
>
> Hi all,
>
>
>
> We have a 3-node cassandra cluster with incremental backup set to true.
>
> Each node has 1TB data volume that stores cassandra data.
>
>
>
> The load in the output of 'nodetool status' comes up at around 260GB each
> node.
>
> All our keyspaces use replication factor = 3.
>
>
>
> However, the df output shows the data volumes consuming around 850GB of
> space.
>
> I checked the keyspace directory structures - most of the space goes in
> <CASS_DATA_VOL>/data/<KEYSPACE>/<CF>/backups.
>
>
>
> We have never manually run snapshots.
>
>
>
> What is the typical procedure to clear the backups?
>
> Can it be done without taking the node offline?
>
>
>
> Thanks,
>
> Kunal
>
>
>
>

Reply via email to