Hi,

we run a Cassandra cluster with three DCs. We noticed that the traffic incurred 
by running the Cluster is significant.

Consider the following simplified IoT scenario:

* time series data from devices in the field is received at Node A
* Node A inserts the data into DC 1
* DC 1 replicates the data within the DC and two the other two DCs

The traffic this produces is significant. The numbers below are based on 
observing the incoming and outgoing traffic on the node level:

* I call the bandwidth for receiving the the data on Node A "base bandwidth"
* Inserting into Cassandra (in one DC) takes 2-3 times the base bandwidth
* Replication to each of the other data centres takes 5 times the base bandwidth
* overall we see a “bandwidth amplification” of ~ 13x (3+5+5)

My questions:

1. Would you considers these factors expected behaviour?
2. Are there ways to reduce the traffic through configuration?

A few additional notes on the setup:

* use NetworkTopologyStrategy for replication and cassandra-rackdc.properties 
to configure the GossipingPropertyFileSnitch
* internode_compression is set to dc
* inter_dc_tcp_nodelay is set to false

Any help is highly appreciated!

Best Regards
Jens

Geschäftsführer: Oliver Koch (CEO), Jean-Baptiste Cornefert, Christoph 
Ostermann, Hermann Schweizer, Bianca Swanston
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, 
USt.-IdNr. DE272208908

Reply via email to