Thank you Aaron. That explanation cleared things up. 2012/4/30 aaron morton <aa...@thelastpickle.com>: > Depends on your definition of significantly, there are a few things to > consider. > > * Reading from SSTables for a request is a serial operation. Reading from 2 > SSTables will take twice as long as 1. > > * If the data in the One Big File™ has been overwritten, reading it is a > waste of time. And it will continue to be read until it the row is compacted > away. > > * You will need to get min_compaction_threshold (CF setting) SSTables that > big before automatic compaction will pickup the big file. > > On the other side: Some people do report getting value from nightly major > compactions. They also manage their cluster to reduce the impact of > performing the compactions. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 26/04/2012, at 9:37 PM, Fredrik wrote: > > Exactly, but why would reads be significantly slower over time when > including just one more, although sometimes large, SSTable in the read? > > Ji Cheng skrev 2012-04-26 11:11: > > I'm also quite interested in this question. Here's my understanding on this > problem. > > 1. If your workload is append-only, doing a major compaction shouldn't > affect the read performance too much, because each row appears in one > sstable anyway. > > 2. If your workload is mostly updating existing rows, then more and more > columns will be obsoleted in that big sstable created by major compaction. > And that super big sstable won't be compacted until you either have another > 3 similar-sized sstables or start another major compaction. But I am not > very sure whether this will be a major problem, because you only end up with > reading one more sstable. Using size-tiered compaction against mostly-update > workload itself may result in reading multiple sstables for a single row > key. > > Please correct me if I am wrong. > > Cheng > > > On Thu, Apr 26, 2012 at 3:50 PM, Fredrik <fredrik.l.stigb...@sitevision.se> > wrote: >> >> In the tuning documentation regarding Cassandra, it's recomended not to >> run major compactions. >> I understand what a major compaction is all about but I'd like an in depth >> explanation as to why reads "will continually degrade until the next major >> compaction is manually invoked". >> >> From the doc: >> "So while read performance will be good immediately following a major >> compaction, it will continually degrade until the next major compaction is >> manually invoked. For this reason, major compaction is NOT recommended by >> DataStax." >> >> Regards >> /Fredrik > > > >
-- Fredrik Larsson Stigbäck SiteVision AB Vasagatan 10, 107 10 Örebro 019-17 30 30