Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

Jack Krupansky Tue, 12 Apr 2016 08:43:42 -0700

Which instance type are you using? Some may be throttled for EBS access, so
you could bump into a rate limit, and who knows what AWS will do at that
point.


-- Jack Krupansky

On Tue, Apr 12, 2016 at 6:02 AM, Alessandro Pieri <alessan...@getstream.io>
wrote:

> Thanks Chris for your reply.
>
> I ran the tests 3 times for 20 minutes/each and I monitored the network
> latency in the meanwhile, it was very low (even the 99th percentile).
>
> I didn't notice any cpu spike caused by the GC but, as you pointed out, I
> will look into the GC log, just to be sure.
>
> In order to avoid the problem you mentioned with EBS and to keep the
> deviation under control I used two ephemeral disks in raid 0.
>
> I think the odd results come from the way cassandra-stress deals with
> multiple nodes. As soon as possible I will go through the Java code to get
> some more detail.
>
> If you have something else in your mind please let me know, your comments
> were really appreciated.
>
> Cheers,
> Alessandro
>
>
> On Mon, Apr 11, 2016 at 4:15 PM, Chris Lohfink <clohfin...@gmail.com>
> wrote:
>
>> Where do you get the ~1ms latency between AZs? Comparing a short term
>> average to a 99th percentile isn't very fair.
>>
>> "Over the last month, the median is 2.09 ms, 90th percentile is
>> 20ms, 99th percentile is 47ms." - per
>> https://www.quora.com/What-are-typical-ping-times-between-different-EC2-availability-zones-within-the-same-region
>>
>> Are you using EBS? That would further impact latency on reads and GCs
>> will always cause hiccups in the 99th+.
>>
>> Chris
>>
>>
>> On Mon, Apr 11, 2016 at 7:57 AM, Alessandro Pieri <siri...@gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Last week I ran some tests to estimate the latency overhead introduces
>>> in a Cassandra cluster by a multi availability zones setup on AWS EC2.
>>>
>>> I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2
>>> nodes/AZ).
>>>
>>> Then, I used cassandra-stress to create an INSERT (write) test of 20M
>>> entries with a replication factor = 3, right after, I ran cassandra-stress
>>> again to READ 10M entries.
>>>
>>> Well, I got the following unexpected result:
>>>
>>> Single-AZ, CL=ONE -> median/95th percentile/99th percentile:
>>> 1.06ms/7.41ms/55.81ms
>>> Multi-AZ, CL=ONE -> median/95th percentile/99th percentile:
>>> 1.16ms/38.14ms/47.75ms
>>>
>>> Basically, switching to the multi-AZ setup the latency increased of
>>> ~30ms. That's too much considering the the average network latency between
>>> AZs on AWS is ~1ms.
>>>
>>> Since I couldn't find anything to explain those results, I decided to
>>> run the cassandra-stress specifying only a single node entry (i.e. "--nodes
>>> node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and
>>> surprisingly the latency went back to 5.9 ms.
>>>
>>> Trying to recap:
>>>
>>> Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th
>>> percentile: 38.14ms
>>> Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms
>>>
>>> For the sake of completeness I've ran a further test using a consistency
>>> level = LOCAL_QUORUM and the test did not show any large variance with
>>> using a single node or multiple ones.
>>>
>>> Do you guys know what could be the reason?
>>>
>>> The test were executed on a m3.xlarge (network optimized) using the
>>> DataStax AMI 2.6.3 running Cassandra v2.0.15.
>>>
>>> Thank you in advance for your help.
>>>
>>> Cheers,
>>> Alessandro
>>>
>>
>>
>
>
> --
> *Alessandro Pieri*
> *Software Architect @ Stream.io Inc*
> e-Mail: alessan...@getstream.io - twitter: sirio7g
> <http://twitter.com/sirio7g>
>
>

Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

Reply via email to