I regularly run repair on my cassandra cluster.   However, I often seen that 
during the repair operation very large amounts of data are transferred to other 
nodes.

My questions is, if only some data is out of sync,  why are entire Data files 
being transferred?

   /var/lib/cassandra/data/DFS/main-f-893-Data.db sections=2602 
progress=22942842880/63149903764 - 36%
   /var/lib/cassandra/data/DFS/main-f-946-Data.db sections=1437 
progress=0/65991601 - 0%
   /var/lib/cassandra/data/DFS/main-f-907-Data.db sections=2602 
progress=0/1635822909 - 0%

My guess is that since data in the Data files is immutable, it needs to copy 
the entire file over, then I assume a compaction would take place to 
consolidate the data.  But that's just my wild guess.

Can anyone explain this behavior?


Reply via email to