Hi! So I've got a table like this:

CREATE TABLE "default".metrics (row_time int,attrs varchar,offset int,value
double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND
dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
AND compression={'sstable_compression':'LZ4Compressor'};

and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB
of heap space. So it's timeseries data that I'm doing so I increment
"row_time" each day, "attrs" is additional identifying information about
each series, and "offset" is the number of milliseconds into the day for
each data point. So for the past 5 days, I've been inserting 3k
points/second distributed across 100k distinct "attrs"es. And now when I
try to run queries on this data that look like

"SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
'potatoes_and_jam'"

it takes an absurdly long time and sometimes just times out. I did
"nodetool cftsats default" and here's what I get:

Keyspace: default
    Read Count: 59
    Read Latency: 397.12523728813557 ms.
    Write Count: 155128
    Write Latency: 0.3675690719921613 ms.
    Pending Flushes: 0
        Table: metrics
        SSTable count: 26
        Space used (live): 35146349027
        Space used (total): 35146349027
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.10386468749216264
        Memtable cell count: 141800
        Memtable data size: 31071290
        Memtable switch count: 41
        Local read count: 59
        Local read latency: 397.126 ms
        Local write count: 155128
        Local write latency: 0.368 ms
        Pending flushes: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 2856
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 36904729268
        Compacted partition mean bytes: 986530969
        Average live cells per slice (last five minutes): 501.66101694915255
        Maximum live cells per slice (last five minutes): 502.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

Ouch! 400ms of read latency, orders of magnitude higher than it has any
right to be. How could this have happened? Is there something fundamentally
broken about my data model? Thanks!

Reply via email to