On Tue, May 1, 2012 at 4:31 AM, Henrik Schröder <skro...@gmail.com> wrote: > But what's the difference between doing an extra read from that One Big > File, than doing an extra read from whatever SSTable happen to be largest in > the course of automatic minor compaction?
The primary differences, as I understand it, are that the index performance and bloom filter false positive rate for your One Big File are worse. First, you are more likely to get a bloom filter false positive due to the intrinsic degradation of bloom filter performance as number of keys increases. Next, after traversing the SStable index to get to the closest indexed key, you will be forced to scan past more keys which are not your key in order to get to the key which is your key. > So I'm still confused. I don't see a significant difference between doing > the occasional major compaction or leaving it to do automatic minor > compactions. What am I missing? Reads will "continually degrade" with > automatic minor compactions as well, won't they? I still don't really understand what precisely "continually degrade" means here either, FWIW, or the two operating paradigms being compared under what sort of workloads. As a simple example, I don't believe performance will "continually" do anything if your workload does not issue logical UPDATE or DELETE to rows. The documentation statement seems confusingly-vaguely-yet-strongly phrased, even if true. =Rob -- =Robert Coli AIM>ALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb