Hm - not sure if I understand the random question. We are using RP. But I wouldn't know why that should matter. I thought that the bloom filter hash function should evenly distribute no matter what keys come in. Keys are '/' separated strings (aka paths :-))
I do bulk inserts like: (1000 rows at a time, with ~ 50 cols each) [ {'a/b/foo': cols}, {'a/b/bar': cols}, {'a/b/baz': cols} ] and before that I would query for 'a/b'. Recursively as in mkdir -p If parent paths are missing they would be inserted with the bulk insert. The value for BloomFilterFalseRatio has been in the range of 0.19 - 0.59 in the last couple of hours. Mostly around 0.3 We're on 0.6.6 btw On Oct 27, 2010, at 3:58 PM, Jonathan Ellis wrote: > This is not expected, no. How random are your queries? If you have a > couple outlier rows causing the false positives that are being queried > over and over then that could just be the luck of the draw. > > On Wed, Oct 27, 2010 at 5:24 AM, Daniel Doubleday > <daniel.double...@gmx.net> wrote: >> Hi people >> >> We are currently moving our second use case from mysql to cassandra. While >> importing the data (ongoing) I noticed that the BloomFilterFalseRation seems >> to be pretty high compared to another CF which is in used in production >> right now. >> >> Its a hierarchical data model and I cannot avoid to do a read before >> inserting multiple columns. >> >> I see a false positive ration of 0.28 while in my other CF it is 0.00025. >> >> The CF has 5 live sstables whiel I read that ratio. At that time I inserted >> ~ 200k rows with a total of 1M cols. Row keys are pretty large unfortunately >> (key.length() ~ 60) >> >> Just wanted to check if this value is to be expected. >> >> >> >> Thanks, >> Daniel > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com