> but reported ratio is Bloom Filter False Ratio: 0.00495 which is higher > than my computed ratio 0.000145. If you were true than reported ratio should > be lower then mine computed from CF reads because there are more reads to > sstables then to CF.
The ratio is the ratio of false positives to true positives *per sstable*. It's not the amount of false positives in each sstable *per cf read*. Thus, there is no expectation of higher vs. lower, and the magnitude of the discrepancy is easily explained by the fact that you only have 10 false positives. That's not a statistically significant sample set. > from investigation of bloom filter FP ratio it seems that default bloom > filter FP ratio (soon user configurable) should be higher. Hbase defaults to > 1% cassandra defaults to 0.000744. bloom filters are using quite a bit > memory now. I don't understand how you reached that conclusion. There is a direct trade-off between memory use and false positive hit rate, yes. That does not mean that hbase's 1% is magically the correct choice. I definitely think it should be tweakable (and IIRC there's work happening on a JIRA to make this an option now), but a 1% false positive hit rate will be completely unacceptable in some circumstances. In others, perfectly acceptable due to the decrease in memory use and few reads. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)