> 1. Is the only place boonfilters are used in Cassandra is when you want to > see if a particular key exists in a particular node? Each node stores a set of SSTables locally, each with a BloomFilter attached: the filters are used to check whether a particular SSTable contains information about a key/row. See http://wiki.apache.org/cassandra/MemtableSSTable for more information on SSTables.
> 2. Are boonfilters a fixed size, or they adjust as to the # of keys? any > example size? Since SSTables are immutable once written, the BloomFilter for an SSTable has a fixed size based on the number of keys that will be stored in the SSTable. There are calculations embedded in Cassandra that attempt to make an optimal size/false positive tradeoff: see org.apache.cassandra.utils.BloomCalculations. > 3. ... > It says "yes", but when you do a lookup the object returned is null No: the BloomFilter answers the question "might there be data for this key in this SSTable?", and the two possible answers it can give are "maybe" or "definitely not". When the BloomFilter says "maybe" we have to go to disk to check out the content of the SSTable. Thanks, Stu -----Original Message----- From: "S Ahmed" <sahmed1...@gmail.com> Sent: Wednesday, April 7, 2010 3:27pm To: dev@cassandra.apache.org Subject: boonfilters Just reading up on boonfilters, few questions. Basically boonfilters let give you a true/false if a particular key exists, and they *may* give you a false positive i.e. they key exists but never a false negative i.e. the key doesn't exist. The core of boonfilters is its hashing mechanism that marks the in-memory matrix/map if the key exists. 1. Is the only place boonfilters are used in Cassandra is when you want to see if a particular key exists in a particular node? 2. Are boonfilters a fixed size, or they adjust as to the # of keys? any example size? 3. Boonfilters don't give false negatives: So you hit a node, and perform a lookup in the boonfilter for a key. It says "yes", but when you do a lookup the object returned is null, so then you flag that this node needs this particular key during replication. Have I grasp this concept? Really loving this project, learning allot from the code. It would be great if someone could do a walkthrough of common functionality in a detailed way :)