I have cassandra nodes with long uptime.  Disk foot print for
cassandra data older is different when I copy to a different folder.
Why is that ?  I have used rsync and cp.  This can be very confusing
when trying to do certain maintenance tasks like hardware upgrade on
EC2 and backing up a snapshot.

I am talking about as much 100% different for 25-40GB of data.  On
copying they grow to double that.  The server's folder is on EC2
magnetic instance-store and I copied to various EBS.  I do not think
that it's something weird about EC2; when I copied EBS data back to
magnetic instance-store
the size remains the same.    So I am guessing there is some kind of
cassandra magical compression that is fooling the operation system
tools like du and df

Some issue with commitlog folder too but the total size of this folder
is not as big and differences is size percent is low.

Thanks for any insight you can share

k.z.

Reply via email to