Re: Question regarding major compaction.

2012-05-02 Thread Henrik Schröder
On Tue, May 1, 2012 at 10:00 PM, Oleg Proudnikov wrote: > There is this note regarding major compaction in the tuning guide: > > "once you run a major compaction, automatic minor compactions are no longer > triggered frequently forcing you to manually run major compactions on a > routine > basis"

Re: Question regarding major compaction.

2012-05-02 Thread Henrik Schröder
On Tue, May 1, 2012 at 9:06 PM, Edward Capriolo wrote: > Also there are some tickets in JIRA to impose a max sstable size and > some other related optimizations that I think got stuck behind levelDB > in coolness factor. Not every use case is good for leveled so adding > more tools and optimizatio

Re: Question regarding major compaction.

2012-05-02 Thread Henrik Schröder
On Tue, May 1, 2012 at 6:07 PM, Rob Coli wrote: > > The primary differences, as I understand it, are that the index > performance and bloom filter false positive rate for your One Big File > are worse. First, you are more likely to get a bloom filter false > positive due to the intrinsic degradat

Re: Question regarding major compaction.

2012-05-01 Thread Oleg Proudnikov
Henrik Schröder gmail.com> writes: > But what's the difference between doing an extra read from that > One Big File, than doing an extra read from whatever SSTable > happen to be largest in the course of automatic minor compaction? There is this note regarding major compaction in the tuning gu

Re: Question regarding major compaction.

2012-05-01 Thread Mohit Anchlia
+1 On Tue, May 1, 2012 at 12:06 PM, Edward Capriolo wrote: > Also there are some tickets in JIRA to impose a max sstable size and > some other related optimizations that I think got stuck behind levelDB > in coolness factor. Not every use case is good for leveled so adding > more tools and optimi

Re: Question regarding major compaction.

2012-05-01 Thread Edward Capriolo
Also there are some tickets in JIRA to impose a max sstable size and some other related optimizations that I think got stuck behind levelDB in coolness factor. Not every use case is good for leveled so adding more tools and optimizations of the Size Tiered tables would be awesome. On Tue, May 1, 2

Re: Question regarding major compaction.

2012-05-01 Thread Rob Coli
On Tue, May 1, 2012 at 4:31 AM, Henrik Schröder wrote: > But what's the difference between doing an extra read from that One Big > File, than doing an extra read from whatever SSTable happen to be largest in > the course of automatic minor compaction? The primary differences, as I understand it,

Re: Question regarding major compaction.

2012-05-01 Thread Jason Rutherglen
I wonder if TieredMergePolicy [1] could be used in Cassandra for compaction? 1. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html On Tue, May 1, 2012 at 6:38 AM, Edward Capriolo wrote: > Henrik, > > There are use cases where major compaction works well like yours an

Re: Question regarding major compaction.

2012-05-01 Thread Edward Capriolo
Henrik, There are use cases where major compaction works well like yours and mine. Essentially cases with a high amount of churn, updates and deletes we get a lot of benefit from forced tombstone removal in the form of less physical data. However we end up with really big sstables that naturally

Re: Question regarding major compaction.

2012-05-01 Thread Henrik Schröder
But what's the difference between doing an extra read from that One Big File, than doing an extra read from whatever SSTable happen to be largest in the course of automatic minor compaction? We have a pretty update-heavy application, and doing a major compaction can remove up to 30% of the used di

Re: Question regarding major compaction.

2012-05-01 Thread Fredrik Stigbäck
Thank you Aaron. That explanation cleared things up. 2012/4/30 aaron morton : > Depends on your definition of significantly, there are a few things to > consider. > > * Reading from SSTables for a request is a serial operation. Reading from 2 > SSTables will take twice as long as 1. > > * If the d

Re: Question regarding major compaction.

2012-04-29 Thread aaron morton
Depends on your definition of significantly, there are a few things to consider. * Reading from SSTables for a request is a serial operation. Reading from 2 SSTables will take twice as long as 1. * If the data in the One Big File™ has been overwritten, reading it is a waste of time. And it w

Re: Question regarding major compaction.

2012-04-26 Thread Fredrik
Exactly, but why would reads be significantly slower over time when including just one more, although sometimes large, SSTable in the read? Ji Cheng skrev 2012-04-26 11:11: I'm also quite interested in this question. Here's my understanding on this problem. 1. If your workload is append-only,

Re: Question regarding major compaction.

2012-04-26 Thread Ji Cheng
I'm also quite interested in this question. Here's my understanding on this problem. 1. If your workload is append-only, doing a major compaction shouldn't affect the read performance too much, because each row appears in one sstable anyway. 2. If your workload is mostly updating existing rows, t