Hey, we are writing a lot of data into a cassandra cluster for a batch loading use case. We cannot use the sstable batch loader, so in order to speed up the loading process we are using RF=1 while the data is loading. After the load is complete, we want to increase the RF. For that, we are updating the RF in the schema and then run the node repair tool on each cassandra instance to stream the data over. However, we are noticing that this process is slowed down by a lot of compactions (the actually streaming of data only takes a couple of minutes).
Cassandra is already running a major compaction after the data loading process has completed. But then, there are to be two more compactions (one on the sender and one on the receiver) happening and those take a very long time even on the aws high i/o instance with no compaction throttling. Question: These additional compactions seem redundant since there are no reads or writes on the cluster after the first major compaction (immediately after the data load), is that right? And if so, what can we do to avoid them? We are currently waiting multiple days. Thank you very much for your help, Matthias