Depends on your definition of significantly, there are a few things to consider.
* Reading from SSTables for a request is a serial operation. Reading from 2 SSTables will take twice as long as 1. * If the data in the One Big File™ has been overwritten, reading it is a waste of time. And it will continue to be read until it the row is compacted away. * You will need to get min_compaction_threshold (CF setting) SSTables that big before automatic compaction will pickup the big file. On the other side: Some people do report getting value from nightly major compactions. They also manage their cluster to reduce the impact of performing the compactions. Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/04/2012, at 9:37 PM, Fredrik wrote: > Exactly, but why would reads be significantly slower over time when including > just one more, although sometimes large, SSTable in the read? > > Ji Cheng skrev 2012-04-26 11:11: >> >> I'm also quite interested in this question. Here's my understanding on this >> problem. >> >> 1. If your workload is append-only, doing a major compaction shouldn't >> affect the read performance too much, because each row appears in one >> sstable anyway. >> >> 2. If your workload is mostly updating existing rows, then more and more >> columns will be obsoleted in that big sstable created by major compaction. >> And that super big sstable won't be compacted until you either have another >> 3 similar-sized sstables or start another major compaction. But I am not >> very sure whether this will be a major problem, because you only end up with >> reading one more sstable. Using size-tiered compaction against mostly-update >> workload itself may result in reading multiple sstables for a single row >> key. >> >> Please correct me if I am wrong. >> >> Cheng >> >> >> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik <fredrik.l.stigb...@sitevision.se> >> wrote: >> In the tuning documentation regarding Cassandra, it's recomended not to run >> major compactions. >> I understand what a major compaction is all about but I'd like an in depth >> explanation as to why reads "will continually degrade until the next major >> compaction is manually invoked". >> >> From the doc: >> "So while read performance will be good immediately following a major >> compaction, it will continually degrade until the next major compaction is >> manually invoked. For this reason, major compaction is NOT recommended by >> DataStax." >> >> Regards >> /Fredrik >> >