Duh.  I totally forgot about my snapshotting just before daily rsync backup.

k.z.

On Wed, Nov 5, 2014 at 3:13 PM, Robert Coli <rc...@eventbrite.com> wrote:
> On Wed, Nov 5, 2014 at 12:08 PM, KZ Win <kz...@pelotoncycle.com> wrote:
>>
>> I have cassandra nodes with long uptime.  Disk foot print for
>> cassandra data older is different when I copy to a different folder.
>
>
>>
>> I am talking about as much 100% different for 25-40GB of data.  On
>> copying they grow to double that.
>
>
> 1) Cassandra automatically "snapshots" SSTables when one does certain
> operations.
> 2) One can also manually create snapshots.
> 3) Snapshots are hard links to files.
> 4) Hard links to files generally become duplicate files when copied to
> another partition, unless rsync or cp is configured to maintain the hard
> link relationship.
> 5) snapshots are kept in a subdirectory of the data directory for the
> columnfamily.
> 6) This all has the pathological seeming outcome that snapshots become
> effectively larger as time passes (because the hard links they contain
> become the only copy of the file when the "original" is deleted from the
> data directory via compaction) and might grow significantly when copied.
>
> tl;dr : modify your rsync to include --exclude=snapshots/
>
> =Rob
>

Reply via email to