On Tue, May 1, 2012 at 6:07 PM, Rob Coli <rc...@palominodb.com> wrote:
> > The primary differences, as I understand it, are that the index > performance and bloom filter false positive rate for your One Big File > are worse. First, you are more likely to get a bloom filter false > positive due to the intrinsic degradation of bloom filter performance > as number of keys increases. Next, after traversing the SStable index > to get to the closest indexed key, you will be forced to scan past > more keys which are not your key in order to get to the key which is > your key. > > Fair enough, but if you have a continually growing dataset, then automatic minor compactions would eventually produce SSTables that are as large as the One Big File you created through a major compaction, it just takes a lot longer to get there. So time will "undo" a major compaction and it's definitely not the case that you're forever in some sort of screwed state where you have to manually compact all the time. I'm also guessing that the wording is just too strong in that part of the documentation, and it would be nice to have a more nuanced piece of advice depending on your traffic pattern. /Henrik