> 1. Is the only place boonfilters are used in Cassandra is when you want to
> see if a particular key exists in a particular node?
Each node stores a set of SSTables locally, each with a BloomFilter attached: 
the filters are used to check whether a particular SSTable contains information 
about a key/row. See http://wiki.apache.org/cassandra/MemtableSSTable for more 
information on SSTables.

> 2. Are boonfilters a fixed size, or they adjust as to the # of keys?  any
> example size?
Since SSTables are immutable once written, the BloomFilter for an SSTable has a 
fixed size based on the number of keys that will be stored in the SSTable. 
There are calculations embedded in Cassandra that attempt to make an optimal 
size/false positive tradeoff: see org.apache.cassandra.utils.BloomCalculations.

> 3. ...
> It says "yes", but when you do a lookup the object returned is null
No: the BloomFilter answers the question "might there be data for this key in 
this SSTable?", and the two possible answers it can give are "maybe" or 
"definitely not". When the BloomFilter says "maybe" we have to go to disk to 
check out the content of the SSTable.

Thanks,
Stu

-----Original Message-----
From: "S Ahmed" <sahmed1...@gmail.com>
Sent: Wednesday, April 7, 2010 3:27pm
To: dev@cassandra.apache.org
Subject: boonfilters

Just reading up on boonfilters, few questions.

Basically boonfilters let give you a true/false if a particular key exists,
and they *may* give you a false positive i.e. they key exists but never a
false negative i.e. the key doesn't exist.

The core of boonfilters is its hashing mechanism that marks the in-memory
matrix/map if the key exists.

1. Is the only place boonfilters are used in Cassandra is when you want to
see if a particular key exists in a particular node?

2. Are boonfilters a fixed size, or they adjust as to the # of keys?  any
example size?

3. Boonfilters don't give false negatives:
    So you hit a node, and perform a lookup in the boonfilter for a key.  It
says "yes", but when you do a lookup the object returned is null, so then
you flag that this node needs this particular key during replication.


Have I grasp this concept?

Really loving this project, learning allot from the code.  It would be great
if someone could do a walkthrough of common functionality in a detailed way
:)


Reply via email to