Re: multithreaded compaction

Terje Marthinussen Tue, 26 Apr 2011 01:54:07 -0700

To be honest, this started after feeding data to cassandra for a while with
compaction disabled (sort of a test case).


when I enabled it... boom... spectacular process with 2000% CPU usage
(please note... there is compression in cassandra in this system).

This system actually have SSD's so when throttled a bit, the I/O is really
not a problem, but I doubt that a HDD based system would have managed to
keep up.

I agree, this is hopefully something that does not normally happen, but then
again, some protection against Murphy's law is always good.

Thanks!
Terje

On Tue, Apr 26, 2011 at 4:35 PM, Sylvain Lebresne <sylv...@datastax.com>wrote:

> On Tue, Apr 26, 2011 at 9:01 AM, Terje Marthinussen
> <tmarthinus...@gmail.com> wrote:
> > Hi,
> > I was testing the multithreaded compactions and with 2x6 cores (24 with
> HT)
> > it does seem a bit crazy with 24 compactions running concurrently.
> > It is probably not very good in terms of random I/O.
>
> It does seems a bit overkill. However, I'm slightly curious how you
> ended up with 24 parallel
> compactions, more precisely, how did you end up with enough sstables
> to trigger 24
> compactions ? Was that done on purpose for testing sake, or did you
> saw that in a real
> situation ?
>
> I'm asking because in 'real' situation, given that compaction are
> triggered only if there is
> some number of files to compact, and provided the cluster is correctly
> provisioned, I wouldn't
> expect the number of parallel compaction to jump to such numbers (one
> of the goal of
> multi_treaded compaction was to make sure we never end up accumulating
> lots of un-compacted
> sstables). Anyway, I get your point, just wondering if that was a real
> situation.
>
> > As such, I think I agree with the argument in 2191 that there should be a
> > config option for this.
> > Probably a default that is dynamic with 1 thread per column family +2 or
> 3
> > thread for parallel compactions outside of that could be good.
> > Any other opinions?
>
> Multi-threaded compaction is optional and compaction throttling is
> supposed to mitigage
> it. However I do agree that too much many compactions may be a bad use
> of resources
> because of random IO even if correctly throttled. I think it's missing
> a configuration option
> "concurrent_compactions" like there is a "concurrent_writes|reads".
> For that, I have created
>  https://issues.apache.org/jira/browse/CASSANDRA-2558
>
> > I guess the compaction thread pool should also show up in tpstats?
>
> Yes it should ... and it will ... eventually :)
>
> Thanks for the feedback.
>
> --
> Sylvain
>

Re: multithreaded compaction

Reply via email to