On Bloom filters and Key Cache

Erik Forsberg Wed, 21 Mar 2012 08:28:20 -0700

Hi!

We're currently testing Cassandra with a large number of row keys pernode - nodetool cfstats approximated number of keys to something like700M per node. This seems to have caused a very large heap consumption.

After readinghttp://wiki.apache.org/cassandra/LargeDataSetConsiderations I think I'vetracked this down to the bloom filter, and the sampled index entries.

Regarding bloom filters, have I understood correctly that they arestored on Heap, and that the "Bloom Filter Space Used" reported by'nodetool cfstats' is an approximation of the heap space used by bloomfilters? It reports the on-disk size, but if I understandCASSANDRA-3497, the on-disk size is smaller than the on-Heap size?

I understand that increasing bloom_filter_fp_chance will decrease thebloom filter size, but at the cost of worse performance when asking forkeys that don't exist. I do have a fair amount of queries for keys thatdon't exist.

How much will increasing the key cache help, i.e. decrease bloom filtersize but increase key cache size? Will the key cache cache negativeresults, i.e. the fact that a key didn't exist?


Regards,
\EF

On Bloom filters and Key Cache

Reply via email to