Re: reported bloom filter FP ratio

Peter Schuller Mon, 26 Dec 2011 08:59:59 -0800

> but reported ratio is  Bloom Filter False Ratio: 0.00495 which is higher
> than my computed ratio 0.000145. If you were true than reported ratio should
> be lower then mine computed from CF reads because there are more reads to
> sstables then to CF.


The ratio is the ratio of false positives to true positives *per
sstable*. It's not the amount of false positives in each sstable *per
cf read*. Thus, there is no expectation of higher vs. lower, and the
magnitude of the discrepancy is easily explained by the fact that you
only have 10 false positives. That's not a statistically significant
sample set.

> from investigation of bloom filter FP ratio it seems that default bloom
> filter FP ratio (soon user configurable) should be higher. Hbase defaults to
> 1% cassandra defaults to 0.000744. bloom filters are using quite a bit
> memory now.

I don't understand how you reached that conclusion. There is a direct
trade-off between memory use and false positive hit rate, yes. That
does not mean that hbase's 1% is magically the correct choice.

I definitely think it should be tweakable (and IIRC there's work
happening on a JIRA to make this an option now), but a 1% false
positive hit rate will be completely unacceptable in some
circumstances. In others, perfectly acceptable due to the decrease in
memory use and few reads.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: reported bloom filter FP ratio

Reply via email to