Re: Question regarding major compaction.

aaron morton Sun, 29 Apr 2012 20:35:27 -0700

Depends on your definition of significantly, there are a few things to 
consider.


* Reading from SSTables for a request is a serial operation. Reading from 2 
SSTables will take twice as long as 1. 

* If the data in the One Big File™ has been overwritten, reading it is a waste 
of time. And it will continue to be read until it the row is compacted away. 

* You will need to get min_compaction_threshold (CF setting) SSTables that big 
before automatic compaction will pickup the big file. 

On the other side: Some people do report getting value from nightly major 
compactions. They also manage their cluster to reduce the impact of performing 
the compactions.

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/04/2012, at 9:37 PM, Fredrik wrote:

> Exactly, but why would reads be significantly slower over time when including 
> just one more, although sometimes large, SSTable in the read?
> 
> Ji Cheng skrev 2012-04-26 11:11:
>> 
>> I'm also quite interested in this question. Here's my understanding on this 
>> problem.
>> 
>> 1. If your workload is append-only, doing a major compaction shouldn't 
>> affect the read performance too much, because each row appears in one 
>> sstable anyway. 
>> 
>> 2. If your workload is mostly updating existing rows, then more and more 
>> columns will be obsoleted in that big sstable created by major compaction. 
>> And that super big sstable won't be compacted until you either have another 
>> 3 similar-sized sstables or start another major compaction. But I am not 
>> very sure whether this will be a major problem, because you only end up with 
>> reading one more sstable. Using size-tiered compaction against mostly-update 
>> workload itself may result in reading multiple sstables for a single row 
>> key. 
>> 
>> Please correct me if I am wrong.
>> 
>> Cheng
>> 
>> 
>> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik <fredrik.l.stigb...@sitevision.se> 
>> wrote:
>> In the tuning documentation regarding Cassandra, it's recomended not to run 
>> major compactions.
>> I understand what a major compaction is all about but I'd like an in depth 
>> explanation as to why reads "will continually degrade until the next major 
>> compaction is manually invoked".
>> 
>> From the doc:
>> "So while read performance will be good immediately following a major 
>> compaction, it will continually degrade until the next major compaction is 
>> manually invoked. For this reason, major compaction is NOT recommended by 
>> DataStax."
>> 
>> Regards
>> /Fredrik
>> 
>

Re: Question regarding major compaction.

Reply via email to