Which instance type are you using? Some may be throttled for EBS access, so you could bump into a rate limit, and who knows what AWS will do at that point.
-- Jack Krupansky On Tue, Apr 12, 2016 at 6:02 AM, Alessandro Pieri <alessan...@getstream.io> wrote: > Thanks Chris for your reply. > > I ran the tests 3 times for 20 minutes/each and I monitored the network > latency in the meanwhile, it was very low (even the 99th percentile). > > I didn't notice any cpu spike caused by the GC but, as you pointed out, I > will look into the GC log, just to be sure. > > In order to avoid the problem you mentioned with EBS and to keep the > deviation under control I used two ephemeral disks in raid 0. > > I think the odd results come from the way cassandra-stress deals with > multiple nodes. As soon as possible I will go through the Java code to get > some more detail. > > If you have something else in your mind please let me know, your comments > were really appreciated. > > Cheers, > Alessandro > > > On Mon, Apr 11, 2016 at 4:15 PM, Chris Lohfink <clohfin...@gmail.com> > wrote: > >> Where do you get the ~1ms latency between AZs? Comparing a short term >> average to a 99th percentile isn't very fair. >> >> "Over the last month, the median is 2.09 ms, 90th percentile is >> 20ms, 99th percentile is 47ms." - per >> https://www.quora.com/What-are-typical-ping-times-between-different-EC2-availability-zones-within-the-same-region >> >> Are you using EBS? That would further impact latency on reads and GCs >> will always cause hiccups in the 99th+. >> >> Chris >> >> >> On Mon, Apr 11, 2016 at 7:57 AM, Alessandro Pieri <siri...@gmail.com> >> wrote: >> >>> Hi everyone, >>> >>> Last week I ran some tests to estimate the latency overhead introduces >>> in a Cassandra cluster by a multi availability zones setup on AWS EC2. >>> >>> I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2 >>> nodes/AZ). >>> >>> Then, I used cassandra-stress to create an INSERT (write) test of 20M >>> entries with a replication factor = 3, right after, I ran cassandra-stress >>> again to READ 10M entries. >>> >>> Well, I got the following unexpected result: >>> >>> Single-AZ, CL=ONE -> median/95th percentile/99th percentile: >>> 1.06ms/7.41ms/55.81ms >>> Multi-AZ, CL=ONE -> median/95th percentile/99th percentile: >>> 1.16ms/38.14ms/47.75ms >>> >>> Basically, switching to the multi-AZ setup the latency increased of >>> ~30ms. That's too much considering the the average network latency between >>> AZs on AWS is ~1ms. >>> >>> Since I couldn't find anything to explain those results, I decided to >>> run the cassandra-stress specifying only a single node entry (i.e. "--nodes >>> node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and >>> surprisingly the latency went back to 5.9 ms. >>> >>> Trying to recap: >>> >>> Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th >>> percentile: 38.14ms >>> Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms >>> >>> For the sake of completeness I've ran a further test using a consistency >>> level = LOCAL_QUORUM and the test did not show any large variance with >>> using a single node or multiple ones. >>> >>> Do you guys know what could be the reason? >>> >>> The test were executed on a m3.xlarge (network optimized) using the >>> DataStax AMI 2.6.3 running Cassandra v2.0.15. >>> >>> Thank you in advance for your help. >>> >>> Cheers, >>> Alessandro >>> >> >> > > > -- > *Alessandro Pieri* > *Software Architect @ Stream.io Inc* > e-Mail: alessan...@getstream.io - twitter: sirio7g > <http://twitter.com/sirio7g> > >