Follow up question: Is it safe to abort the compactions happening after node repair?
On Mon, Oct 15, 2012 at 6:32 PM, Will Martin <w...@voodoolunchbox.com>wrote: > +1 It doesn't make sense that the xfr compactions are heavy unless they > are translating the file. This could be a protocol mismatch: however the > requirements for node level compaction and wire compaction I would expect > to be pretty different. > On Oct 15, 2012, at 4:42 PM, Matthias Broecheler wrote: > > > Hey, > > > > we are writing a lot of data into a cassandra cluster for a batch > loading use case. We cannot use the sstable batch loader, so in order to > speed up the loading process we are using RF=1 while the data is loading. > After the load is complete, we want to increase the RF. For that, we are > updating the RF in the schema and then run the node repair tool on each > cassandra instance to stream the data over. However, we are noticing that > this process is slowed down by a lot of compactions (the actually streaming > of data only takes a couple of minutes). > > > > Cassandra is already running a major compaction after the data loading > process has completed. But then, there are to be two more compactions (one > on the sender and one on the receiver) happening and those take a very long > time even on the aws high i/o instance with no compaction throttling. > > > > Question: These additional compactions seem redundant since there are no > reads or writes on the cluster after the first major compaction > (immediately after the data load), is that right? And if so, what can we do > to avoid them? We are currently waiting multiple days. > > > > Thank you very much for your help, > > Matthias > > > > -- Matthias Broecheler, PhD http://www.matthiasb.com E-Mail: m...@matthiasb.com