I took a look at the code where the bloom filter true/false positive counters are updated and notice that the true-positive count isn't being updated on key cache hits: https://issues.apache.org/jira/browse/CASSANDRA-8525. That may explain your ratios.
Can you try querying for a few non-existent partition keys in cqlsh with tracing enabled (just run "TRACING ON") and see if you really do get that high of a false-positive ratio? On Fri, Dec 19, 2014 at 9:59 AM, Mark Greene <green...@gmail.com> wrote: > > We're seeing similar behavior except our FP ratio is closer to 1.0 (100%). > > We're using Cassandra 2.1.2. > > > Schema > ----------------------------------------------------------------------- > CREATE TABLE contacts.contact ( > id bigint, > property_id int, > created_at bigint, > updated_at bigint, > value blob, > PRIMARY KEY (id, property_id) > ) WITH CLUSTERING ORDER BY (property_id ASC) > * AND bloom_filter_fp_chance = 0.001* > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '4', 'class': > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > > CF Stats Output: > ------------------------------------------------------------------------- > Keyspace: contacts > Read Count: 2458375 > Read Latency: 0.8528440766766665 ms. > Write Count: 10357 > Write Latency: 0.1816912233272183 ms. > Pending Flushes: 0 > Table: contact > SSTable count: 61 > SSTables in each level: [1, 10, 50, 0, 0, 0, 0, 0, 0] > Space used (live): 9047112471 > Space used (total): 9047112471 > Space used by snapshots (total): 0 > SSTable Compression Ratio: 0.34119240020241487 > Memtable cell count: 24570 > Memtable data size: 1299614 > Memtable switch count: 2 > Local read count: 2458290 > Local read latency: 0.853 ms > Local write count: 10044 > Local write latency: 0.186 ms > Pending flushes: 0 > Bloom filter false positives: 11096 > * Bloom filter false ratio: 0.99197* > Bloom filter space used: 3923784 > Compacted partition minimum bytes: 373 > Compacted partition maximum bytes: 152321 > Compacted partition mean bytes: 9938 > Average live cells per slice (last five minutes): 37.57851240677983 > Maximum live cells per slice (last five minutes): 63.0 > Average tombstones per slice (last five minutes): 0.0 > Maximum tombstones per slice (last five minutes): 0.0 > > -- > about.me <http://about.me/markgreene> > > On Wed, Dec 17, 2014 at 1:32 PM, Chris Hart <ch...@remilon.com> wrote: >> >> Hi, >> >> I have create the following table with bloom_filter_fp_chance=0.01: >> >> CREATE TABLE logged_event ( >> time_key bigint, >> partition_key_randomizer int, >> resource_uuid timeuuid, >> event_json text, >> event_type text, >> field_error_list map<text, text>, >> javascript_timestamp timestamp, >> javascript_uuid uuid, >> page_impression_guid uuid, >> page_request_guid uuid, >> server_received_timestamp timestamp, >> session_id bigint, >> PRIMARY KEY ((time_key, partition_key_randomizer), resource_uuid) >> ) WITH >> bloom_filter_fp_chance=0.010000 AND >> caching='KEYS_ONLY' AND >> comment='' AND >> dclocal_read_repair_chance=0.000000 AND >> gc_grace_seconds=864000 AND >> index_interval=128 AND >> read_repair_chance=0.000000 AND >> replicate_on_write='true' AND >> populate_io_cache_on_flush='false' AND >> default_time_to_live=0 AND >> speculative_retry='99.0PERCENTILE' AND >> memtable_flush_period_in_ms=0 AND >> compaction={'class': 'SizeTieredCompactionStrategy'} AND >> compression={'sstable_compression': 'LZ4Compressor'}; >> >> >> When I run cfstats, I see a much higher false positive ratio: >> >> Table: logged_event >> SSTable count: 15 >> Space used (live), bytes: 104128214227 >> Space used (total), bytes: 104129482871 >> SSTable Compression Ratio: 0.3295840184239226 >> Number of keys (estimate): 199293952 >> Memtable cell count: 56364 >> Memtable data size, bytes: 20903960 >> Memtable switch count: 148 >> Local read count: 1396402 >> Local read latency: 0.362 ms >> Local write count: 2345306 >> Local write latency: 0.062 ms >> Pending tasks: 0 >> Bloom filter false positives: 147705 >> Bloom filter false ratio: 0.49020 >> Bloom filter space used, bytes: 249129040 >> Compacted partition minimum bytes: 447 >> Compacted partition maximum bytes: 315852 >> Compacted partition mean bytes: 1636 >> Average live cells per slice (last five minutes): 0.0 >> Average tombstones per slice (last five minutes): 0.0 >> >> Any idea what could be causing this? This is timeseries data. Every >> time we read from this table, we read a single row key with 1000 >> partition_key_randomizer values. I'm running cassandra 2.0.11. I tried >> running an upgradesstables to rewrite them, which didn't change this >> behavior at all. I'm using size tiered compaction and I haven't done any >> major compactions. >> >> Thanks, >> Chris >> > -- Tyler Hobbs DataStax <http://datastax.com/>