Hey,

we are writing a lot of data into a cassandra cluster for a batch loading
use case. We cannot use the sstable batch loader, so in order to speed up
the loading process we are using RF=1 while the data is loading. After the
load is complete, we want to increase the RF. For that, we are updating the
RF in the schema and then run the node repair tool on each cassandra
instance to stream the data over. However, we are noticing that this
process is slowed down by a lot of compactions (the actually streaming of
data only takes a couple of minutes).

Cassandra is already running a major compaction after the data loading
process has completed. But then, there are to be two more compactions (one
on the sender and one on the receiver) happening and those take a very long
time even on the aws high i/o instance with no compaction throttling.

Question: These additional compactions seem redundant since there are no
reads or writes on the cluster after the first major compaction
(immediately after the data load), is that right? And if so, what can we do
to avoid them? We are currently waiting multiple days.

Thank you very much for your help,
Matthias

Reply via email to