Re: Really high read latency

Ben Bromhead Mon, 23 Mar 2015 16:41:56 -0700

nodetool cfhistograms is also very helpful in diagnosing these kinds of
data modelling issues.


On 23 March 2015 at 14:43, Chris Lohfink <clohfin...@gmail.com> wrote:

>
> >>  Compacted partition maximum bytes: 36904729268
>
> thats huge... 36gb rows are gonna cause a lot of problems, even when you
> specify a precise cell under this it still is going to have an enormous
> column index to deserialize on every read for the partition.  As mentioned
> above, you should include your attribute name in the partition key ((row_time,
> attrs))
>  to spread this out... Id call that critical
>
> Chris
>
> On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith <
> david92galbra...@gmail.com> wrote:
>
>> I haven't deleted anything. Here's output from a traced cqlsh query (I
>> tried to make the spaces line up, hope it's legible):
>>
>> Execute CQL3
>> query
>> | 2015-03-23 21:04:37.422000 | 172.31.32.211 |              0
>> Parsing select * from default.metrics where row_time = 16511 and attrs =
>> '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000
>> | 172.31.32.211 |             93
>> Preparing statement
>> [SharedPool-Worker-2]
>> | 2015-03-23 21:04:37.423000 | 172.31.32.211 |            696
>> Executing single-partition query on metrics [SharedPool-Worker-1]
>>
>>                                                   | 2015-03-23
>> 21:04:37.425000 | 172.31.32.211 |           2807
>> Acquiring sstable references [SharedPool-Worker-1]
>>
>>                                         | 2015-03-23 21:04:37.425000 |
>> 172.31.32.211 |           2993
>> Merging memtable tombstones [SharedPool-Worker-1]
>>
>>                                     | 2015-03-23 21:04:37.426000 |
>> 172.31.32.211 |           3049
>> Partition index with 484338 entries found for sstable 15966
>> [SharedPool-Worker-1]
>>                         | 2015-03-23 21:04:38.625000 | 172.31.32.211
>> |         202304
>> Seeking to partition indexed section in data file
>> [SharedPool-Worker-1]
>>             | 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202354
>> Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
>> 202445
>> Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
>> 202478
>> Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
>> 202508
>> Bloom filter allows skipping sstable 5610
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202539
>> Bloom filter allows skipping sstable 5549
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202678
>> Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
>> 202720
>> Bloom filter allows skipping sstable 5237
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202752
>> Bloom filter allows skipping sstable 2516
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202782
>> Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1]
>>
>>                     | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
>> 202812
>> Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1]
>>
>>                     | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
>> 202852
>> Skipped 0/11 non-slice-intersecting sstables, included 0 due to
>> tombstones [SharedPool-Worker-1]                               | 2015-03-23
>> 21:04:38.625001 | 172.31.32.211 |         202882
>> Merging data from memtables and 1 sstables [SharedPool-Worker-1]
>>
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202902
>> Read 101 live and 0 tombstoned cells
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.626000 | 172.31.32.211 |         203752
>> Request complete
>>
>>                                                              | 2015-03-23
>> 21:04:38.628253 | 172.31.32.211 |         206253
>>
>> On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens <migh...@gmail.com> wrote:
>>
>>> Enable tracing in cqlsh and see how many sstables are being lifted to
>>> satisfy the query (are you repeatedly writing to the same partition
>>> [row_time]) over time?).
>>>
>>> Also watch for whether you're hitting a lot of tombstones (are you
>>> deleting lots of values in the same partition over time?).
>>>
>>> On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <
>>> david92galbra...@gmail.com> wrote:
>>>
>>>> Duncan: I'm thinking it might be something like that. I'm also seeing
>>>> just a ton of garbage collection on the box, could it be pulling rows for
>>>> all 100k attrs for a given row_time into memory since only row_time is the
>>>> partition key?
>>>>
>>>> Jens: I'm not using EBS (although I used to until I read up on how
>>>> useless it is). I'm not sure what constitutes proper paging but my client
>>>> has a pretty small amount of available memory so I'm doing pages of size 5k
>>>> using the C++ Datastax driver.
>>>>
>>>> Thanks for the replies!
>>>>
>>>> -Dave
>>>>
>>>> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <jens.ran...@tink.se>
>>>> wrote:
>>>>
>>>>> Also, two control questions:
>>>>>
>>>>>    - Are you using EBS for data storage? It might introduce
>>>>>    additional latencies.
>>>>>    - Are you doing proper paging when querying the keyspace?
>>>>>
>>>>> Cheers,
>>>>> Jens
>>>>>
>>>>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
>>>>> david92galbra...@gmail.com> wrote:
>>>>>
>>>>>> Hi! So I've got a table like this:
>>>>>>
>>>>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>>>>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>>>>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>>>>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 
>>>>>> AND
>>>>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>>>>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>>>>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>>>>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>>>>>> AND compression={'sstable_compression':'LZ4Compressor'};
>>>>>>
>>>>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with
>>>>>> 4 GB of heap space. So it's timeseries data that I'm doing so I increment
>>>>>> "row_time" each day, "attrs" is additional identifying information about
>>>>>> each series, and "offset" is the number of milliseconds into the day for
>>>>>> each data point. So for the past 5 days, I've been inserting 3k
>>>>>> points/second distributed across 100k distinct "attrs"es. And now when I
>>>>>> try to run queries on this data that look like
>>>>>>
>>>>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>>>>>> 'potatoes_and_jam'"
>>>>>>
>>>>>> it takes an absurdly long time and sometimes just times out. I did
>>>>>> "nodetool cftsats default" and here's what I get:
>>>>>>
>>>>>> Keyspace: default
>>>>>>     Read Count: 59
>>>>>>     Read Latency: 397.12523728813557 ms.
>>>>>>     Write Count: 155128
>>>>>>     Write Latency: 0.3675690719921613 ms.
>>>>>>     Pending Flushes: 0
>>>>>>         Table: metrics
>>>>>>         SSTable count: 26
>>>>>>         Space used (live): 35146349027
>>>>>>         Space used (total): 35146349027
>>>>>>         Space used by snapshots (total): 0
>>>>>>         SSTable Compression Ratio: 0.10386468749216264
>>>>>>         Memtable cell count: 141800
>>>>>>         Memtable data size: 31071290
>>>>>>         Memtable switch count: 41
>>>>>>         Local read count: 59
>>>>>>         Local read latency: 397.126 ms
>>>>>>         Local write count: 155128
>>>>>>         Local write latency: 0.368 ms
>>>>>>         Pending flushes: 0
>>>>>>         Bloom filter false positives: 0
>>>>>>         Bloom filter false ratio: 0.00000
>>>>>>         Bloom filter space used: 2856
>>>>>>         Compacted partition minimum bytes: 104
>>>>>>         Compacted partition maximum bytes: 36904729268
>>>>>>         Compacted partition mean bytes: 986530969
>>>>>>         Average live cells per slice (last five minutes):
>>>>>> 501.66101694915255
>>>>>>         Maximum live cells per slice (last five minutes): 502.0
>>>>>>         Average tombstones per slice (last five minutes): 0.0
>>>>>>         Maximum tombstones per slice (last five minutes): 0.0
>>>>>>
>>>>>> Ouch! 400ms of read latency, orders of magnitude higher than it has
>>>>>> any right to be. How could this have happened? Is there something
>>>>>> fundamentally broken about my data model? Thanks!
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jens Rantil
>>>>> Backend engineer
>>>>> Tink AB
>>>>>
>>>>> Email: jens.ran...@tink.se
>>>>> Phone: +46 708 84 18 32
>>>>> Web: www.tink.se
>>>>>
>>>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>>>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>>>>  Twitter <https://twitter.com/tink>
>>>>>
>>>>
>>>>
>>>
>>
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
<http://twitter.com/instaclustr> | (650) 284 9692

Re: Really high read latency

Reply via email to