Re: Question regarding major compaction.

Mohit Anchlia Tue, 01 May 2012 12:32:08 -0700

+1

On Tue, May 1, 2012 at 12:06 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:


> Also there are some tickets in JIRA to impose a max sstable size and
> some other related optimizations that I think got stuck behind levelDB
> in coolness factor. Not every use case is good for leveled so adding
> more tools and optimizations of the Size Tiered tables would be
> awesome.
>
> On Tue, May 1, 2012 at 10:15 AM, Jason Rutherglen
> <jason.rutherg...@gmail.com> wrote:
> > I wonder if TieredMergePolicy [1] could be used in Cassandra for
> compaction?
> >
> > 1.
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> >
> > On Tue, May 1, 2012 at 6:38 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
> >> Henrik,
> >>
> >> There are use cases where major compaction works well like yours and
> >> mine. Essentially cases with a high amount of churn, updates and
> >> deletes we get a lot of benefit from forced tombstone removal in the
> >> form of less physical data.
> >>
> >> However we end up with really big sstables that naturally will never
> >> get compacted away since they are so much bigger then the other
> >> tables. So we get stuck always major compacting forever.
> >>
> >> Cassandra needs un compact for people like us so we can turn 1 big
> >> sstable into multiple smaller ones. Or a major compaction that takes
> >> in multiple sstables and produces multiple output tables nicely
> >> organized for bloom filter hits and tombstone free.
> >>
> >> Edward
> >>
> >> On Tue, May 1, 2012 at 7:31 AM, Henrik Schröder <skro...@gmail.com>
> wrote:
> >>> But what's the difference between doing an extra read from that One Big
> >>> File, than doing an extra read from whatever SSTable happen to be
> largest in
> >>> the course of automatic minor compaction?
> >>>
> >>> We have a pretty update-heavy application, and doing a major
> compaction can
> >>> remove up to 30% of the used diskspace. That directly translates into
> less
> >>> reads and less SSTables that rows appear in. Everything that's
> unchanged
> >>> since the last major compaction is obviously faster to access, and
> >>> everything that's changed since the last major compaction is about the
> same
> >>> as if we hadn't done it?
> >>>
> >>> So I'm still confused. I don't see a significant difference between
> doing
> >>> the occasional major compaction or leaving it to do automatic minor
> >>> compactions. What am I missing? Reads will "continually degrade" with
> >>> automatic minor compactions as well, won't they?
> >>>
> >>> I can sort of see that if you have a moving active data set, then that
> will
> >>> most probably only exist in the smallest SSTables and frequently be the
> >>> object of minor compactions, and doing a major compaction will move
> all of
> >>> it into the biggest SSTables?
> >>>
> >>>
> >>> /Henrik
> >>>
> >>> On Mon, Apr 30, 2012 at 05:35, aaron morton <aa...@thelastpickle.com>
> wrote:
> >>>>
> >>>> Depends on your definition of significantly, there are a few things to
> >>>> consider.
> >>>>
> >>>> * Reading from SSTables for a request is a serial operation. Reading
> from
> >>>> 2 SSTables will take twice as long as 1.
> >>>>
> >>>> * If the data in the One Big File™ has been overwritten, reading it
> is a
> >>>> waste of time. And it will continue to be read until it the row is
> compacted
> >>>> away.
> >>>>
> >>>> * You will need to get min_compaction_threshold (CF setting) SSTables
> that
> >>>> big before automatic compaction will pickup the big file.
> >>>>
> >>>> On the other side: Some people do report getting value from nightly
> major
> >>>> compactions. They also manage their cluster to reduce the impact of
> >>>> performing the compactions.
> >>>>
> >>>> Hope that helps.
> >>>>
> >>>> -----------------
> >>>> Aaron Morton
> >>>> Freelance Developer
> >>>> @aaronmorton
> >>>> http://www.thelastpickle.com
> >>>>
> >>>> On 26/04/2012, at 9:37 PM, Fredrik wrote:
> >>>>
> >>>> Exactly, but why would reads be significantly slower over time when
> >>>> including just one more, although sometimes large, SSTable in the
> read?
> >>>>
> >>>> Ji Cheng skrev 2012-04-26 11:11:
> >>>>
> >>>> I'm also quite interested in this question. Here's my understanding on
> >>>> this problem.
> >>>>
> >>>> 1. If your workload is append-only, doing a major compaction shouldn't
> >>>> affect the read performance too much, because each row appears in one
> >>>> sstable anyway.
> >>>>
> >>>> 2. If your workload is mostly updating existing rows, then more and
> more
> >>>> columns will be obsoleted in that big sstable created by major
> compaction.
> >>>> And that super big sstable won't be compacted until you either have
> another
> >>>> 3 similar-sized sstables or start another major compaction. But I am
> not
> >>>> very sure whether this will be a major problem, because you only end
> up with
> >>>> reading one more sstable. Using size-tiered compaction against
> mostly-update
> >>>> workload itself may result in reading multiple sstables for a single
> row
> >>>> key.
> >>>>
> >>>> Please correct me if I am wrong.
> >>>>
> >>>> Cheng
> >>>>
> >>>>
> >>>> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik
> >>>> <fredrik.l.stigb...@sitevision.se> wrote:
> >>>>>
> >>>>> In the tuning documentation regarding Cassandra, it's recomended not
> to
> >>>>> run major compactions.
> >>>>> I understand what a major compaction is all about but I'd like an in
> >>>>> depth explanation as to why reads "will continually degrade until
> the next
> >>>>> major compaction is manually invoked".
> >>>>>
> >>>>> From the doc:
> >>>>> "So while read performance will be good immediately following a major
> >>>>> compaction, it will continually degrade until the next major
> compaction is
> >>>>> manually invoked. For this reason, major compaction is NOT
> recommended by
> >>>>> DataStax."
> >>>>>
> >>>>> Regards
> >>>>> /Fredrik
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
>

Re: Question regarding major compaction.

Reply via email to