I think we are talking past each other here. What I was missing was the size of the filter. I was assuming that the size of the filter was the number of bits specified in the BloomFilterCalculations (error on my part), what I was missing was the multiplication of the number of bits by the number of keys. Is there a fixed number of bits (it looks to be Integer.MAX_VALUE - 20) or is it calculated somewhere?
On Tue, Jul 11, 2023 at 10:11 AM Benedict <bened...@apache.org> wrote: > I’m not sure I follow your reasoning. The bloom filter table is false > positive per sstable given the number of bits *per key*. So for 10 keys you > would have 200 bits, which yields the same false positive rate as 20 bits > and 1 key. > > It does taper slightly at much larger N, but it’s pretty nominal for > practical purposes. > > I don’t understand what you mean by merging multiple filters together. We > do lookup multiple bloom filters per query, but only one per sstable, and > the false positive rate you’re calculating for 10 such lookups would not be > accurate. This would be 1-(1-0.0000671)^10 which is still only around a 4%, > not 100%. You seem to be looking at the false positive rate of a bloom > filter of 20 bits with 10 entries, which means only 2 bits per entry? > > On 11 Jul 2023, at 07:14, Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > > > Can someone explain to me how the Bloom filter table in > BloomFilterCalculations was derived and how it is supposed to work? As I > read the table it seems to indicate that with 14 hashes and 20 bits you get > a fp of 6.71e-05. But if you plug those numbers into the Bloom filter > calculator [1], that is calculated only for 1 item being in the filter. > If you merge multiple filters together the false positive rate goes up. > And as [1] shows by 5 merges you are over 50% fp rate and by 10 you are at > close to 100% fp. So I have to assume this analysis is wrong. Can someone > point me to the correct calculations? > > Claude > > [1] https://hur.st/bloomfilter/?n=&p=6.71e-05&m=20&k=14 > >