+1   It doesn't make sense that the xfr compactions are heavy unless they are 
translating the file. This could be a protocol mismatch: however the 
requirements for node level compaction and wire compaction I would expect to be 
pretty different.
On Oct 15, 2012, at 4:42 PM, Matthias Broecheler wrote:

> Hey,
> 
> we are writing a lot of data into a cassandra cluster for a batch loading use 
> case. We cannot use the sstable batch loader, so in order to speed up the 
> loading process we are using RF=1 while the data is loading. After the load 
> is complete, we want to increase the RF. For that, we are updating the RF in 
> the schema and then run the node repair tool on each cassandra instance to 
> stream the data over. However, we are noticing that this process is slowed 
> down by a lot of compactions (the actually streaming of data only takes a 
> couple of minutes).
> 
> Cassandra is already running a major compaction after the data loading 
> process has completed. But then, there are to be two more compactions (one on 
> the sender and one on the receiver) happening and those take a very long time 
> even on the aws high i/o instance with no compaction throttling. 
> 
> Question: These additional compactions seem redundant since there are no 
> reads or writes on the cluster after the first major compaction (immediately 
> after the data load), is that right? And if so, what can we do to avoid them? 
> We are currently waiting multiple days.
> 
> Thank you very much for your help,
> Matthias
> 

Reply via email to