Hi, we run a Cassandra cluster with three DCs. We noticed that the traffic incurred by running the Cluster is significant.
Consider the following simplified IoT scenario: * time series data from devices in the field is received at Node A * Node A inserts the data into DC 1 * DC 1 replicates the data within the DC and two the other two DCs The traffic this produces is significant. The numbers below are based on observing the incoming and outgoing traffic on the node level: * I call the bandwidth for receiving the the data on Node A "base bandwidth" * Inserting into Cassandra (in one DC) takes 2-3 times the base bandwidth * Replication to each of the other data centres takes 5 times the base bandwidth * overall we see a “bandwidth amplification” of ~ 13x (3+5+5) My questions: 1. Would you considers these factors expected behaviour? 2. Are there ways to reduce the traffic through configuration? A few additional notes on the setup: * use NetworkTopologyStrategy for replication and cassandra-rackdc.properties to configure the GossipingPropertyFileSnitch * internode_compression is set to dc * inter_dc_tcp_nodelay is set to false Any help is highly appreciated! Best Regards Jens Geschäftsführer: Oliver Koch (CEO), Jean-Baptiste Cornefert, Christoph Ostermann, Hermann Schweizer, Bianca Swanston Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908