Hi Dave,

On 23/03/15 05:56, Dave Galbraith wrote:
Hi! So I've got a table like this:

CREATE TABLE "default".metrics (row_time int,attrs varchar,offset int,value
double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND
dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128
AND read_repair_chance=1 AND replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
AND compression={'sstable_compression':'LZ4Compressor'};

does it work better with
  PRIMARY KEY((row_time, attrs), offset)
?

Ciao, Duncan.


and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB of
heap space. So it's timeseries data that I'm doing so I increment "row_time"
each day, "attrs" is additional identifying information about each series, and
"offset" is the number of milliseconds into the day for each data point. So for
the past 5 days, I've been inserting 3k points/second distributed across 100k
distinct "attrs"es. And now when I try to run queries on this data that look 
like

"SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs = 
'potatoes_and_jam'"

it takes an absurdly long time and sometimes just times out. I did "nodetool
cftsats default" and here's what I get:

Keyspace: default
     Read Count: 59
     Read Latency: 397.12523728813557 ms.
     Write Count: 155128
     Write Latency: 0.3675690719921613 ms.
     Pending Flushes: 0
         Table: metrics
         SSTable count: 26
         Space used (live): 35146349027
         Space used (total): 35146349027
         Space used by snapshots (total): 0
         SSTable Compression Ratio: 0.10386468749216264
         Memtable cell count: 141800
         Memtable data size: 31071290
         Memtable switch count: 41
         Local read count: 59
         Local read latency: 397.126 ms
         Local write count: 155128
         Local write latency: 0.368 ms
         Pending flushes: 0
         Bloom filter false positives: 0
         Bloom filter false ratio: 0.00000
         Bloom filter space used: 2856
         Compacted partition minimum bytes: 104
         Compacted partition maximum bytes: 36904729268
         Compacted partition mean bytes: 986530969
         Average live cells per slice (last five minutes): 501.66101694915255
         Maximum live cells per slice (last five minutes): 502.0
         Average tombstones per slice (last five minutes): 0.0
         Maximum tombstones per slice (last five minutes): 0.0

Ouch! 400ms of read latency, orders of magnitude higher than it has any right to
be. How could this have happened? Is there something fundamentally broken about
my data model? Thanks!


Reply via email to