Re: Read performance

Alprema Mon, 11 May 2015 04:53:29 -0700

According to the trace log, only one was read, the compaction strategy is
size tiered.


I attached a more readable version of my trace for details.

On Mon, May 11, 2015 at 11:35 AM, Anishek Agarwal <anis...@gmail.com> wrote:

> how many sst tables were there?   what compaction are you using ? These
> properties define how many possible disk reads cassandra has to do to get
> all the data you need depending on which SST Tables have data for your
> partition key.
>
> On Fri, May 8, 2015 at 6:25 PM, Alprema <alpr...@alprema.com> wrote:
>
>> I was planning on using a more "server-friendly" strategy anyway (by
>> parallelizing my workload on multiple metrics) but my concern here is more
>> about the raw numbers.
>>
>> According to the trace and my estimation of the data size, the read from
>> disk was done at about 30MByte/s and the transfer between the responsible
>> node and the coordinator was done at 120Mbits/s which doesn't seem right
>> given that the cluster was not busy and the network is Gbit capable.
>>
>> I know that there is some overhead, but these numbers seem odd to me, do
>> they seem normal to you ?
>>
>> On Fri, May 8, 2015 at 2:34 PM, Bryan Holladay <holla...@longsight.com>
>> wrote:
>>
>>> Try breaking it up into smaller chunks using multiple threads and token
>>> ranges. 86400 is pretty large. I found ~1000 results per query is good.
>>> This will spread the burden across all servers a little more evenly.
>>>
>>> On Thu, May 7, 2015 at 4:27 AM, Alprema <alpr...@alprema.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am writing an application that will periodically read big amounts of
>>>> data from Cassandra and I am experiencing odd performances.
>>>>
>>>> My column family is a classic time series one, with series ID and Day
>>>> as partition key and a timestamp as clustering key, the value being a
>>>> double.
>>>>
>>>> The query I run gets all the values for a given time series for a given
>>>> day (so about 86400 points):
>>>>
>>>> SELECT "UtcDate", "Value"FROM "Metric_OneSec"WHERE "MetricId" = 
>>>> 12215ece-6544-4fcf-a15d-4f9e9ce1567eAND "Day" = '2015-05-05 
>>>> 00:00:00+0000'LIMIT 86400;
>>>>
>>>>
>>>> This takes about 450ms to run and when I trace the query I see that it
>>>> takes about 110ms to read the data from disk and 224ms to send the data
>>>> from the responsible node to the coordinator (full trace in attachment).
>>>>
>>>> I did a quick estimation of the requested data (correct me if I'm
>>>> wrong):
>>>> 86400 * (column name + column value + timestamp + ttl)
>>>> = 86400 * (8 + 8 + 8 + 8?)
>>>> = 2.6Mb
>>>>
>>>> Let's say about 3Mb with misc. overhead, so these timings seem pretty
>>>> slow to me for a modern SSD and a 1Gb/s NIC.
>>>>
>>>> Do those timings seem normal? Am I missing something?
>>>>
>>>> Thank you,
>>>>
>>>> Kévin
>>>>
>>>>
>>>>
>>>
>>
>

 activity                                                                 | 
timestamp    | source         | source_elapsed
--------------------------------------------------------------------------+--------------+----------------+----------------
                                                       execute_cql3_query | 
09:25:45,027 |     node01     |              0
                                    Message received from /node01         | 
09:25:45,021 |     node02     |             10
                        Executing single-partition query on Metric_OneSec | 
09:25:45,021 |     node02     |            156
                                             Acquiring sstable references | 
09:25:45,021 |     node02     |            164
                                              Merging memtable tombstones | 
09:25:45,021 |     node02     |            179
                                Bloom filter allows skipping sstable 5153 | 
09:25:45,021 |     node02     |            198
                                Bloom filter allows skipping sstable 5152 | 
09:25:45,021 |     node02     |            205
                                Bloom filter allows skipping sstable 5151 | 
09:25:45,021 |     node02     |            211
                                Bloom filter allows skipping sstable 5146 | 
09:25:45,021 |     node02     |            217
                                           Key cache hit for sstable 5125 | 
09:25:45,021 |     node02     |            228
                              Seeking to partition beginning in data file | 
09:25:45,021 |     node02     |            231
                                Bloom filter allows skipping sstable 5040 | 
09:25:45,022 |     node02     |            470
                                Bloom filter allows skipping sstable 4955 | 
09:25:45,022 |     node02     |            479
                                Bloom filter allows skipping sstable 4614 | 
09:25:45,022 |     node02     |            485
Skipped 0/8 non-slice-intersecting sstables, included 0 due to tombstones | 
09:25:45,022 |     node02     |            491
                               Merging data from memtables and 1 sstables | 
09:25:45,022 |     node02     |            495
 Parsing                                                             
 SELECT "Value" FROM "Metric_OneSec"                                 
 WHERE "MetricId" = 12215ece-6544-4fcf-a15d-4f9e9ce1567e             
 AND "Day" = '2015-05-05 00:00:00+0000'                              
 LIMIT 86400;                                                             | 
09:25:45,027 |     node01     |             23
                                                      Preparing statement | 
09:25:45,027 |     node01     |            115
                                       Sending message to /node02         | 
09:25:45,027 |     node01     |            798
                                   Read 86090 live and 0 tombstoned cells | 
09:25:45,135 |     node02     |         113809
                                    Enqueuing response to /node01         | 
09:25:45,135 |     node02     |         114046
                                       Sending message to /node01         | 
09:25:45,135 |     node02     |         114108
                                    Message received from /node02         | 
09:25:45,365 |     node01     |         338615
                                 Processing response from /node02         | 
09:25:45,365 |     node01     |         338654
                                                         Request complete | 
09:25:45,455 |     node01     |         428111

Re: Read performance

Reply via email to