+1 It doesn't make sense that the xfr compactions are heavy unless they are translating the file. This could be a protocol mismatch: however the requirements for node level compaction and wire compaction I would expect to be pretty different. On Oct 15, 2012, at 4:42 PM, Matthias Broecheler wrote:
> Hey, > > we are writing a lot of data into a cassandra cluster for a batch loading use > case. We cannot use the sstable batch loader, so in order to speed up the > loading process we are using RF=1 while the data is loading. After the load > is complete, we want to increase the RF. For that, we are updating the RF in > the schema and then run the node repair tool on each cassandra instance to > stream the data over. However, we are noticing that this process is slowed > down by a lot of compactions (the actually streaming of data only takes a > couple of minutes). > > Cassandra is already running a major compaction after the data loading > process has completed. But then, there are to be two more compactions (one on > the sender and one on the receiver) happening and those take a very long time > even on the aws high i/o instance with no compaction throttling. > > Question: These additional compactions seem redundant since there are no > reads or writes on the cluster after the first major compaction (immediately > after the data load), is that right? And if so, what can we do to avoid them? > We are currently waiting multiple days. > > Thank you very much for your help, > Matthias >