On 2014-08-18 19:52, Robert Coli wrote: > On Mon, Aug 18, 2014 at 6:21 AM, Erik Forsberg <forsb...@opera.com > <mailto:forsb...@opera.com>> wrote: > > Is there some configuration knob I can tune to make this happen faster? > I'm getting a bit confused by the description for min_sstable_size, > bucket_high, bucket_low etc - and I'm not sure if they apply in this > case. > > > You probably don't want to use multi-threaded compaction, it is removed > upstream. > > nodetool setcompactionthroughput 0 > > Assuming you have enough IO headroom etc.
OK. I disabled multithreaded and gave it a bit more throughput to play with, but I still don't think that's the full story. What I see is the following case: 1) My hadoop cluster is bulkloading around 1000 sstables to the Cassandra cluster. 2) Cassandra will start compacting. With SizeTiered, I would see multiple ongoing compactions on the CF in question, each taking on 32 sstables and compacting to one, all of them running at the same time. With Leveled, I see only one compaction, taking on 32 sstables compacting to one. When that finished, it will start another one. So it's essentially a serial process, and it takes a much longer time than what it does with SizeTiered. While this compaction is ongoing, read performance is not very good. http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 mentions LCS is parallelized in Cassandra 1.2, but maybe that patch doesn't cover my use case (although I realize that my use case is maybe a bit weird) So my question is if this is something I can tune? I'm running 1.2.18 now, but am strongly considering upgrade to 2.0.X. Regards, \EF