Hm -

not sure if I understand the random question. We are using RP. But I wouldn't 
know why that should matter.
I thought that the bloom filter hash function should evenly distribute no 
matter what keys come in.
 
Keys are '/' separated strings (aka paths :-))

I do bulk inserts like: (1000 rows at a time, with ~ 50 cols each)

[
        {'a/b/foo': cols},
        {'a/b/bar': cols},
        {'a/b/baz': cols}
]

and before that I would query for 'a/b'. Recursively as in mkdir -p

If parent paths are missing they would be inserted with the bulk insert.

The value for BloomFilterFalseRatio has been in the range of 0.19 - 0.59 in the 
last couple of hours. Mostly around 0.3

We're on 0.6.6 btw


On Oct 27, 2010, at 3:58 PM, Jonathan Ellis wrote:

> This is not expected, no.  How random are your queries?  If you have a
> couple outlier rows causing the false positives that are being queried
> over and over then that could just be the luck of the draw.
> 
> On Wed, Oct 27, 2010 at 5:24 AM, Daniel Doubleday
> <daniel.double...@gmx.net> wrote:
>> Hi people
>> 
>> We are currently moving our second use case from mysql to cassandra. While 
>> importing the data (ongoing) I noticed that the BloomFilterFalseRation seems 
>> to be pretty high compared to another CF which is in used in production 
>> right now.
>> 
>> Its a hierarchical data model and I cannot avoid to do a read before 
>> inserting multiple columns.
>> 
>> I see a false positive ration of 0.28 while in my other CF it is 0.00025.
>> 
>> The CF has 5 live sstables whiel I read that ratio. At that time I inserted 
>> ~ 200k rows with a total of 1M cols. Row keys are pretty large unfortunately 
>> (key.length() ~ 60)
>> 
>> Just wanted to check if this value is to be expected.
>> 
>> 
>> 
>> Thanks,
>> Daniel
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com

Reply via email to