Thanks for the reply, Razi. As I mentioned earlier, we're not currently using snapshots - it's only the backups that are bothering me right now.
So my next question is pertaining to this statement of yours: As far as I am aware, using *rm* is perfectly safe to delete the > directories for snapshots/backups as long as you are careful not to delete > your actively used sstable files and directories. How do I find out which are the actively used sstables? If by that you mean the main data files, does that mean I can safely remove all files ONLY under the "backups/" directory? Or, removing any files that are current hard-links inside backups can potentially cause any issues? Thanks, Kunal On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/NLM/NCBI) [C] < raziuddin.kh...@nih.gov> wrote: > Hello Kunal, > > > > I would take a look at the following configuration options in the > Cassandra.yaml > > > > *Common automatic backup settings* > > *Incremental_backups:* > > http://docs.datastax.com/en/archived/cassandra/3.x/ > cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__ > incremental_backups > > > > (Default: false) Backs up data updated since the last snapshot was taken. > When enabled, Cassandra creates a hard link to each SSTable flushed or > streamed locally in a backups subdirectory of the keyspace data. Removing > these links is the operator's responsibility. > > > > *snapshot_before_compaction*: > > http://docs.datastax.com/en/archived/cassandra/3.x/ > cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__ > snapshot_before_compaction > > > > (Default: false) Enables or disables taking a snapshot before each > compaction. A snapshot is useful to back up data when there is a data > format change. Be careful using this option: Cassandra does not clean up > older snapshots automatically. > > > > > > *Advanced automatic backup setting* > > *auto_snapshot*: > > http://docs.datastax.com/en/archived/cassandra/3.x/ > cassandra/configuration/configCassandra_yaml.html# > configCassandra_yaml__auto_snapshot > > > > (Default: true) Enables or disables whether Cassandra takes a snapshot of > the data before truncating a keyspace or dropping a table. To prevent data > loss, Datastax strongly advises using the default setting. If you > set auto_snapshot to false, you lose data on truncation or drop. > > > > > > *nodetool* also provides methods to manage snapshots. > http://docs.datastax.com/en/archived/cassandra/3.x/ > cassandra/tools/toolsNodetool.html > > See the specific commands: > > - nodetool clearsnapshot > > <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsClearSnapShot.html> > Removes one or more snapshots. > - nodetool listsnapshots > > <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsListSnapShots.html> > Lists snapshot names, size on disk, and true size. > - nodetool snapshot > > <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsSnapShot.html> > Take a snapshot of one or more keyspaces, or of a table, to backup > data. > > > > As far as I am aware, using *rm* is perfectly safe to delete the > directories for snapshots/backups as long as you are careful not to delete > your actively used sstable files and directories. I think the *nodetool > clearsnapshot* command is provided so that you don’t accidentally delete > actively used files. Last I used *clearsnapshot*, (a very long time > ago), I thought it left behind the directory, but this could have been > fixed in newer versions (so you might want to check that). > > > > HTH > > -Razi > > > > > > *From: *Jonathan Haddad <j...@jonhaddad.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Tuesday, January 10, 2017 at 12:26 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Re: Backups eating up disk space > > > > If you remove the files from the backup directory, you would not have data > loss in the case of a node going down. They're hard links to the same > files that are in your data directory, and are created when an sstable is > written to disk. At the time, they take up (almost) no space, so they > aren't a big deal, but when the sstable gets compacted, they stick around, > so they end up not freeing space up. > > > > Usually you use incremental backups as a means of moving the sstables off > the node to a backup location. If you're not doing anything with them, > they're just wasting space and you should disable incremental backups. > > > > Some people take snapshots then rely on incremental backups. Others use > the tablesnap utility which does sort of the same thing. > > > > On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar < > kgangakhed...@gmail.com> wrote: > > Thanks for quick reply, Jon. > > > > But, what about in case of node/cluster going down? Would there be data > loss if I remove these files manually? > > > > How is it typically managed in production setups? > > What are the best-practices for the same? > > Do people take snapshots on each node before removing the backups? > > > > This is my first production deployment - so, still trying to learn. > > > > Thanks, > > Kunal > > > > On 10 January 2017 at 21:36, Jonathan Haddad <j...@jonhaddad.com> wrote: > > You can just delete them off the filesystem (rm) > > > > On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar < > kgangakhed...@gmail.com> wrote: > > Hi all, > > > > We have a 3-node cassandra cluster with incremental backup set to true. > > Each node has 1TB data volume that stores cassandra data. > > > > The load in the output of 'nodetool status' comes up at around 260GB each > node. > > All our keyspaces use replication factor = 3. > > > > However, the df output shows the data volumes consuming around 850GB of > space. > > I checked the keyspace directory structures - most of the space goes in > <CASS_DATA_VOL>/data/<KEYSPACE>/<CF>/backups. > > > > We have never manually run snapshots. > > > > What is the typical procedure to clear the backups? > > Can it be done without taking the node offline? > > > > Thanks, > > Kunal > > > >