I regularly run repair on my cassandra cluster. However, I often seen that during the repair operation very large amounts of data are transferred to other nodes.
My questions is, if only some data is out of sync, why are entire Data files being transferred? /var/lib/cassandra/data/DFS/main-f-893-Data.db sections=2602 progress=22942842880/63149903764 - 36% /var/lib/cassandra/data/DFS/main-f-946-Data.db sections=1437 progress=0/65991601 - 0% /var/lib/cassandra/data/DFS/main-f-907-Data.db sections=2602 progress=0/1635822909 - 0% My guess is that since data in the Data files is immutable, it needs to copy the entire file over, then I assume a compaction would take place to consolidate the data. But that's just my wild guess. Can anyone explain this behavior?