Where do you get the ~1ms latency between AZs? Comparing a short term average to a 99th percentile isn't very fair.
"Over the last month, the median is 2.09 ms, 90th percentile is 20ms, 99th percentile is 47ms." - per https://www.quora.com/What-are-typical-ping-times-between-different-EC2-availability-zones-within-the-same-region Are you using EBS? That would further impact latency on reads and GCs will always cause hiccups in the 99th+. Chris On Mon, Apr 11, 2016 at 7:57 AM, Alessandro Pieri <siri...@gmail.com> wrote: > Hi everyone, > > Last week I ran some tests to estimate the latency overhead introduces in > a Cassandra cluster by a multi availability zones setup on AWS EC2. > > I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2 > nodes/AZ). > > Then, I used cassandra-stress to create an INSERT (write) test of 20M > entries with a replication factor = 3, right after, I ran cassandra-stress > again to READ 10M entries. > > Well, I got the following unexpected result: > > Single-AZ, CL=ONE -> median/95th percentile/99th percentile: > 1.06ms/7.41ms/55.81ms > Multi-AZ, CL=ONE -> median/95th percentile/99th percentile: > 1.16ms/38.14ms/47.75ms > > Basically, switching to the multi-AZ setup the latency increased of ~30ms. > That's too much considering the the average network latency between AZs on > AWS is ~1ms. > > Since I couldn't find anything to explain those results, I decided to run > the cassandra-stress specifying only a single node entry (i.e. "--nodes > node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and > surprisingly the latency went back to 5.9 ms. > > Trying to recap: > > Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th > percentile: 38.14ms > Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms > > For the sake of completeness I've ran a further test using a consistency > level = LOCAL_QUORUM and the test did not show any large variance with > using a single node or multiple ones. > > Do you guys know what could be the reason? > > The test were executed on a m3.xlarge (network optimized) using the > DataStax AMI 2.6.3 running Cassandra v2.0.15. > > Thank you in advance for your help. > > Cheers, > Alessandro >