>> Compacted partition maximum bytes: 36904729268 thats huge... 36gb rows are gonna cause a lot of problems, even when you specify a precise cell under this it still is going to have an enormous column index to deserialize on every read for the partition. As mentioned above, you should include your attribute name in the partition key ((row_time, attrs)) to spread this out... Id call that critical
Chris On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith <david92galbra...@gmail.com> wrote: > I haven't deleted anything. Here's output from a traced cqlsh query (I > tried to make the spaces line up, hope it's legible): > > Execute CQL3 > query > | 2015-03-23 21:04:37.422000 | 172.31.32.211 | 0 > Parsing select * from default.metrics where row_time = 16511 and attrs = > '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000 > | 172.31.32.211 | 93 > Preparing statement > [SharedPool-Worker-2] > | 2015-03-23 21:04:37.423000 | 172.31.32.211 | 696 > Executing single-partition query on metrics [SharedPool-Worker-1] > > | 2015-03-23 > 21:04:37.425000 | 172.31.32.211 | 2807 > Acquiring sstable references [SharedPool-Worker-1] > > | 2015-03-23 21:04:37.425000 | > 172.31.32.211 | 2993 > Merging memtable tombstones [SharedPool-Worker-1] > > | 2015-03-23 21:04:37.426000 | > 172.31.32.211 | 3049 > Partition index with 484338 entries found for sstable 15966 > [SharedPool-Worker-1] > | 2015-03-23 21:04:38.625000 | 172.31.32.211 > | 202304 > Seeking to partition indexed section in data file > [SharedPool-Worker-1] > | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202354 > Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1] > > | 2015-03-23 21:04:38.625000 | 172.31.32.211 | > 202445 > Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1] > > | 2015-03-23 21:04:38.625000 | 172.31.32.211 | > 202478 > Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1] > > | 2015-03-23 21:04:38.625000 | 172.31.32.211 | > 202508 > Bloom filter allows skipping sstable 5610 > [SharedPool-Worker-1] > | 2015-03-23 21:04:38.625000 | 172.31.32.211 | 202539 > Bloom filter allows skipping sstable 5549 > [SharedPool-Worker-1] > | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202678 > Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1] > > | 2015-03-23 21:04:38.625001 | 172.31.32.211 | > 202720 > Bloom filter allows skipping sstable 5237 > [SharedPool-Worker-1] > | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202752 > Bloom filter allows skipping sstable 2516 > [SharedPool-Worker-1] > | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202782 > Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1] > > | 2015-03-23 21:04:38.625001 | 172.31.32.211 | > 202812 > Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1] > > | 2015-03-23 21:04:38.625001 | 172.31.32.211 | > 202852 > Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones > [SharedPool-Worker-1] | 2015-03-23 > 21:04:38.625001 | 172.31.32.211 | 202882 > Merging data from memtables and 1 sstables [SharedPool-Worker-1] > > | 2015-03-23 21:04:38.625001 | 172.31.32.211 | 202902 > Read 101 live and 0 tombstoned cells > [SharedPool-Worker-1] > | 2015-03-23 21:04:38.626000 | 172.31.32.211 | 203752 > Request complete > > | 2015-03-23 > 21:04:38.628253 | 172.31.32.211 | 206253 > > On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens <migh...@gmail.com> wrote: > >> Enable tracing in cqlsh and see how many sstables are being lifted to >> satisfy the query (are you repeatedly writing to the same partition >> [row_time]) over time?). >> >> Also watch for whether you're hitting a lot of tombstones (are you >> deleting lots of values in the same partition over time?). >> >> On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith < >> david92galbra...@gmail.com> wrote: >> >>> Duncan: I'm thinking it might be something like that. I'm also seeing >>> just a ton of garbage collection on the box, could it be pulling rows for >>> all 100k attrs for a given row_time into memory since only row_time is the >>> partition key? >>> >>> Jens: I'm not using EBS (although I used to until I read up on how >>> useless it is). I'm not sure what constitutes proper paging but my client >>> has a pretty small amount of available memory so I'm doing pages of size 5k >>> using the C++ Datastax driver. >>> >>> Thanks for the replies! >>> >>> -Dave >>> >>> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <jens.ran...@tink.se> >>> wrote: >>> >>>> Also, two control questions: >>>> >>>> - Are you using EBS for data storage? It might introduce additional >>>> latencies. >>>> - Are you doing proper paging when querying the keyspace? >>>> >>>> Cheers, >>>> Jens >>>> >>>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith < >>>> david92galbra...@gmail.com> wrote: >>>> >>>>> Hi! So I've got a table like this: >>>>> >>>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset >>>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT >>>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND >>>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 >>>>> AND >>>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' >>>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND >>>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND >>>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'} >>>>> AND compression={'sstable_compression':'LZ4Compressor'}; >>>>> >>>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with >>>>> 4 GB of heap space. So it's timeseries data that I'm doing so I increment >>>>> "row_time" each day, "attrs" is additional identifying information about >>>>> each series, and "offset" is the number of milliseconds into the day for >>>>> each data point. So for the past 5 days, I've been inserting 3k >>>>> points/second distributed across 100k distinct "attrs"es. And now when I >>>>> try to run queries on this data that look like >>>>> >>>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs = >>>>> 'potatoes_and_jam'" >>>>> >>>>> it takes an absurdly long time and sometimes just times out. I did >>>>> "nodetool cftsats default" and here's what I get: >>>>> >>>>> Keyspace: default >>>>> Read Count: 59 >>>>> Read Latency: 397.12523728813557 ms. >>>>> Write Count: 155128 >>>>> Write Latency: 0.3675690719921613 ms. >>>>> Pending Flushes: 0 >>>>> Table: metrics >>>>> SSTable count: 26 >>>>> Space used (live): 35146349027 >>>>> Space used (total): 35146349027 >>>>> Space used by snapshots (total): 0 >>>>> SSTable Compression Ratio: 0.10386468749216264 >>>>> Memtable cell count: 141800 >>>>> Memtable data size: 31071290 >>>>> Memtable switch count: 41 >>>>> Local read count: 59 >>>>> Local read latency: 397.126 ms >>>>> Local write count: 155128 >>>>> Local write latency: 0.368 ms >>>>> Pending flushes: 0 >>>>> Bloom filter false positives: 0 >>>>> Bloom filter false ratio: 0.00000 >>>>> Bloom filter space used: 2856 >>>>> Compacted partition minimum bytes: 104 >>>>> Compacted partition maximum bytes: 36904729268 >>>>> Compacted partition mean bytes: 986530969 >>>>> Average live cells per slice (last five minutes): >>>>> 501.66101694915255 >>>>> Maximum live cells per slice (last five minutes): 502.0 >>>>> Average tombstones per slice (last five minutes): 0.0 >>>>> Maximum tombstones per slice (last five minutes): 0.0 >>>>> >>>>> Ouch! 400ms of read latency, orders of magnitude higher than it has >>>>> any right to be. How could this have happened? Is there something >>>>> fundamentally broken about my data model? Thanks! >>>>> >>>>> >>>> >>>> >>>> -- >>>> Jens Rantil >>>> Backend engineer >>>> Tink AB >>>> >>>> Email: jens.ran...@tink.se >>>> Phone: +46 708 84 18 32 >>>> Web: www.tink.se >>>> >>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin >>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> >>>> Twitter <https://twitter.com/tink> >>>> >>> >>> >> >