There are two factors in terms of Cassandra that determine what's called 
network topology: datacenter and rack.

rack - it's not necessarily a physical rack, it's rather a single point of 
failure. For example, in case of AWS one availability zone is usually chosen to 
be a Cassandra rack.

datacenter - is a set of racks between which we have good network connection 
and low latency. Usually for AWS it's a region.


If you use NetworkTopologyStrategy + properly configured snitch, network 
topology is taken into account during replica placement: there won't be more 
than 1 replica of a data chunk in a rack. This means that if a whole rack fails 
(for example AWS AZ goes offline), there are still 2 other replicas online for 
each chunk of data (in case RF=3) and queries with CL=QUORUM are still working.

In order to avoid data imbalance between all the nodes (which may cause "hot 
spots" in your cluster = performance impact), all racks should have the same 
number of nodes with approximately the same capacity.


Also, sometimes CL=QUORUM isn't used correctly and CL=LOCAL_QUORUM should be 
used instead. There are no differences between the two in case of one DC, but 
in case of two and more DC's the former leads to cross-DC communication, as 
majority of all replicas across all DC's should be queried. This obviously 
leads to increased latencies. The same is true, for example, for ONE vs 
LOCAL_ONE.


If you take a look at the manual how to add  DC to a cluster you'll all find 
cautions about QUORUM/LOCAL_QUORUM there during the operation. The reason is 
when data which is supposed to be in the new DC isn't already there (as 
streaming is in progress and hasn't completed yet), it will cause blocking read 
repairs.


Regards,

Kyrill


________________________________
From: Murtaza Talwari <mdt_...@hotmail.com>
Sent: Thursday, August 2, 2018 1:22:16 PM
To: user@cassandra.apache.org
Subject: Performance impact of using NetworkTopology with 3 node cassandra 
cluster in One DC


We are using 3 node Cassandra cluster in one data center.



For our keyspaces as suggested in best practices we are using NetworkTopology 
for replication strategy using the GossipingPropertyFileSnitch.

For Read/Write consistency we are using as QUORUM.



In majority of cases when users use NetworkTopology as replication strategy 
they might have multiple DataCenters configured.

In our case we have only one DataCenter,



  *   With that using the NetworkTopology as replication strategy will it cause 
any performance impact ?
  *   As we are using QUORUM as Read/Write consistency which is considering 
multiple DataCenters, does QUORUM consistency have any performance impact ? is 
it OK to continue using QUORUM consistency considering future expansions of 
data centers ?



Please suggest.



Regards,

Reply via email to