I think we are talking past each other here.  What I was missing was the
size of the filter.  I was assuming that the size of the filter was the
number of bits specified in the BloomFilterCalculations (error on my
part),  what I was missing was the multiplication of the number of bits by
the number of keys.   Is there a fixed number of bits (it looks to be
Integer.MAX_VALUE - 20) or is it calculated somewhere?


On Tue, Jul 11, 2023 at 10:11 AM Benedict <bened...@apache.org> wrote:

> I’m not sure I follow your reasoning. The bloom filter table is false
> positive per sstable given the number of bits *per key*. So for 10 keys you
> would have 200 bits, which yields the same false positive rate as 20 bits
> and 1 key.
>
> It does taper slightly at much larger N, but it’s pretty nominal for
> practical purposes.
>
> I don’t understand what you mean by merging multiple filters together. We
> do lookup multiple bloom filters per query, but only one per sstable, and
> the false positive rate you’re calculating for 10 such lookups would not be
> accurate. This would be 1-(1-0.0000671)^10 which is still only around a 4%,
> not 100%. You seem to be looking at the false positive rate of a bloom
> filter of 20 bits with 10 entries, which means only 2 bits per entry?
>
> On 11 Jul 2023, at 07:14, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
> 
> Can someone explain to me how the Bloom filter table in
> BloomFilterCalculations was derived and how it is supposed to work?  As I
> read the table it seems to indicate that with 14 hashes and 20 bits you get
> a fp of 6.71e-05.  But if you plug those numbers into the Bloom filter
> calculator [1],  that is calculated only for 1 item being in the filter.
> If you merge multiple filters together the false positive rate goes up.
> And as [1] shows by 5 merges you are over 50% fp rate and by 10 you are at
> close to 100% fp.  So I have to assume this analysis is wrong.  Can someone
> point me to the correct calculations?
>
> Claude
>
> [1] https://hur.st/bloomfilter/?n=&p=6.71e-05&m=20&k=14
>
>

Reply via email to