Thank you all for your responses.

Alex –
  Instance (ephemeral) SSD

Ben –
the query reads data from just one partition. If disk i/o is the bottleneck, 
then in theory, if reading from EBS takes 10 seconds, then it should take lot 
less when reading the same amount of data from local SSD. My question is not 
about why it is taking 10 seconds, but why is the read time same for both EBS 
(network attached storage) and local SSD?

Tony –
if the data was cached in memory, then a read should not take 10 seconds just 
for 20MB data

Rob –
Here is the schema, query, and trace. I masked the actual column names to 
protect the innocents ☺

create table dummy(
  a   varchar,
  b   varchar,
  c   varchar,
  d   varchar,
  e   varchar,
  f   varchar,
  g   varchar,
  h   timestamp,
  i   int,
  non_key1   varchar,
  ...
  non_keyN   varchar,
  PRIMARY KEY ((a, b, c, d, e, f), g, h, i)
) WITH CLUSTERING ORDER BY (g ASC, h DESC, i ASC)

SELECT h, non_key100, non_key200 FROM dummy WHERE a='aaaa' AND b='bbbbbb' AND 
c='ccc' AND d='dd' AND e='eeeeeeeeeeee' AND f='ffffffffff' AND g='ggggggggg'AND 
h >='2014-09-10T00:00:00' AND h<='2014-09-10T23:40:41';

The above query returns around 250,000 CQL rows.

cqlsh trace:

activity | timestamp    | source      | source_elapsed
-------------------------------------------------------------------------------------
execute_cql3_query | 21:57:16,830 | 10.10.100.5 |              0
Parsing query; | 21:57:16,830 | 10.10.100.5 |            673
Preparing statement | 21:57:16,831 | 10.10.100.5 |           1602
Executing single-partition query on event | 21:57:16,845 | 10.10.100.5 |        
  14871
Acquiring sstable references | 21:57:16,845 | 10.10.100.5 |          14896
Merging memtable tombstones | 21:57:16,845 | 10.10.100.5 |          14954
Bloom filter allows skipping sstable 1049 | 21:57:16,845 | 10.10.100.5 |        
  15090
Bloom filter allows skipping sstable 989 | 21:57:16,845 | 10.10.100.5 |         
 15146
Partition index with 0 entries found for sstable 937 | 21:57:16,845 | 
10.10.100.5 |          15565
Seeking to partition indexed section in data file | 21:57:16,845 | 10.10.100.5 
|          15581
Partition index with 7158 entries found for sstable 884 | 21:57:16,898 | 
10.10.100.5 |          68644
Seeking to partition indexed section in data file | 21:57:16,899 | 10.10.100.5 
|          69014
Partition index with 20819 entries found for sstable 733 | 21:57:16,916 | 
10.10.100.5 |          86121
Seeking to partition indexed section in data file | 21:57:16,916 | 10.10.100.5 
|          86412
Skipped 1/6 non-slice-intersecting sstables, included 0 due to tombstones | 
21:57:16,916 | 10.10.100.5 |          86494
Merging data from memtables and 3 sstables | 21:57:16,916 | 10.10.100.5 |       
   86522
Read 193311 live and 0 tombstoned cells | 21:57:24,552 | 10.10.100.5 |        
7722425
Request complete | 21:57:29,074 | 10.10.100.5 |       12244832


Mohammed

From: Alex Major [mailto:al3...@gmail.com]
Sent: Wednesday, September 17, 2014 3:47 AM
To: user@cassandra.apache.org
Subject: Re: no change observed in read latency after switching from EBS to SSD 
storage

When you say you moved from EBS to SSD, do you mean the EBS HDD drives to EBS 
SSD drives? Or instance SSD drives? The m3.large only comes with 32GB of 
instance based SSD storage. If you're using EBS SSD drives then network will 
still be the slowest thing so switching won't likely make much of a difference.

On Wed, Sep 17, 2014 at 6:00 AM, Mohammed Guller 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Rob,
The 10 seconds latency that I gave earlier is from CQL tracing. Almost 5 
seconds out of that was taken up by the “merge memtable and sstables” step. The 
remaining 5 seconds are from “read live and tombstoned cells.”

I too first thought that maybe disk is not the bottleneck and Cassandra is 
serving everything from cache, but in that case, it should not take 10 seconds 
for reading just 20MB data.

Also, I narrowed down the query to limit it to a single partition read and I 
ran the query in cqlsh running on the same node. I turned on tracing, which 
shows that all the steps got executed on the same node. htop shows that CPU and 
memory are not the bottlenecks. Network should not come into play since the 
cqlsh is running on the same node.

Is there any performance tuning parameter in the cassandra.yaml file for large 
reads?

Mohammed

From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: Tuesday, September 16, 2014 5:42 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: no change observed in read latency after switching from EBS to SSD 
storage

On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Does anyone have insight as to why we don't see any performance impact on the 
reads going from EBS to SSD?

What does it say when you enable tracing on this CQL query?

10 seconds is a really long time to access anything in Cassandra. There is, 
generally speaking, a reason why the default timeouts are lower than this.

My conjecture is that the data in question was previously being served from the 
page cache and is now being served from SSD. You have, in switching from 
EBS-plus-page-cache to SSD successfully proved that SSD and RAM are both very 
fast. There is also a strong suggestion that whatever access pattern you are 
using is not bounded by disk performance.

=Rob


Reply via email to