If they each have their own copy of the data, then they are *not* non-overlapping!
If you have non-overlapping SSTables (and you know the min/max keys), it's like having one big SSTable because you know exactly where each row is, and it becomes easy to merge a new SSTable in small batches, rather than in one huge batch. The only step that you have to add to the current merge process is, when you going to write a new SSTable, if it's too big, to write N (non-overlapping!) pieces instead. On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen <tmarthinus...@gmail.com > wrote: > Yes, agreed. > > I actually think cassandra has to. > > And if you do not go down to that single file, how do you avoid getting > into a situation where you can very realistically end up with 4-5 big > sstables each having its own copy of the same data massively increasing disk > requirements? > > Terje > > On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn <da...@taotown.com> wrote: > >> "I'm also not too much in favor of triggering major compactions, because >> it mostly have a nasty effect (create one huge sstable)." >> >> If that is the case, why can't major compactions create many, >> non-overlapping SSTables? >> >> In general, it seems to me that non-overlapping SSTables have all the >> advantages of big SSTables (i.e. you know exactly where the data is) without >> the disadvantages that come with being big. Why doesn't Cassandra take >> advantage of that in a major way? >> > >