On Wed, Nov 5, 2014 at 12:08 PM, KZ Win <kz...@pelotoncycle.com> wrote:

> I have cassandra nodes with long uptime.  Disk foot print for
> cassandra data older is different when I copy to a different folder.
>


> I am talking about as much 100% different for 25-40GB of data.  On
> copying they grow to double that.


1) Cassandra automatically "snapshots" SSTables when one does certain
operations.
2) One can also manually create snapshots.
3) Snapshots are hard links to files.
4) Hard links to files generally become duplicate files when copied to
another partition, unless rsync or cp is configured to maintain the hard
link relationship.
5) snapshots are kept in a subdirectory of the data directory for the
columnfamily.
6) This all has the pathological seeming outcome that snapshots become
effectively larger as time passes (because the hard links they contain
become the only copy of the file when the "original" is deleted from the
data directory via compaction) and might grow significantly when copied.

tl;dr : modify your rsync to include --exclude=snapshots/

=Rob

Reply via email to