Hi Ryan

I took a sample of one sstable (just flushed, not compacted). 

I compared 2 samples of sstables. One that is showing fine false positive 
ratios and the problem one. 
And yes both look the same to me. Both have the expected 15 buckets per row and 
the cardinality of the bitsets are the same.

But I am pretty sure that it is indeed as suggested a problem with skewed query 
pattern. I stopped the import and started a random read test and things look 
better.

I'll try to reproduce this with a patched cassandra to get more debug info to 
figure out why this is happening. Because I still don't understand it.

Thanks for your time everyone
 
== Sample of problem CD ==

DATA FILE

file size: 68804626 bytes
rows: 7432 

FILTER FILE

file size: 14013 bytes
bloom filter bitset size: 111488
bloom filter bitset cardinalaity: 54062


== Sample of working CF ==

DATA FILE

file size: 110730565 bytes
rows: 47432

FILTER FILE

file size: 96565 bytes
bloom filter bitset size: 771904
bloom filter bitset cardinalaity: 354610


On Oct 27, 2010, at 6:41 PM, Ryan King wrote:

> On Wed, Oct 27, 2010 at 3:24 AM, Daniel Doubleday
> <daniel.double...@gmx.net> wrote:
>> Hi people
>> 
>> We are currently moving our second use case from mysql to cassandra. While 
>> importing the data (ongoing) I noticed that the BloomFilterFalseRation seems 
>> to be pretty high compared to another CF which is in used in production 
>> right now.
>> 
>> Its a hierarchical data model and I cannot avoid to do a read before 
>> inserting multiple columns.
>> 
>> I see a false positive ration of 0.28 while in my other CF it is 0.00025.
>> 
>> The CF has 5 live sstables whiel I read that ratio. At that time I inserted 
>> ~ 200k rows with a total of 1M cols. Row keys are pretty large unfortunately 
>> (key.length() ~ 60)
>> 
>> Just wanted to check if this value is to be expected.
> 
> This is not expected. How big are the bloom filters on disk?
> 
> -ryan

Reply via email to