So with the holidays hopefully being over, I thought I'd ask again :) Could someone please help with answers to the two questions: - Is it reasonable to expect that cross-datacenter node-to-node replication traffic is greater than actual client-to-server traffic that generates this activity? Specifically talking about counter increments. - Is there anything that can be done to lower the amount of cross-datacenter replication traffic while keeping actual replication going (i.e. we can't afford to not replicate data, but we can afford e.g. delays in replication)?
Best regards, Sergey Sergey Olefir wrote > Hi, > > as part of our ongoing tests with Cassandra, we've tried to evaluate the > amount of traffic generated in client-to-server and server-to-server > (replication scenarios). > > The results we are getting are surprising. > > Our setup: > - Cassandra 1.1.7. > - 3 DC with 2 nodes each. > - NetworkTopology replication strategy with 2 replicas per DC (so > basically each node contains full data set). > - 100 clients concurrently incrementing counters at the rate of the > roughly 100 / second (i.e. about 10k increments per second). Clients > perform writes to DC:1 only. server-to-server traffic measurement was done > in DC:2. > - Clients use batches to write to the server (up to 100 increments per > batch, overall each client writes 1 or 2 batches per second). > > Clients are Java-based accessing Cassandra via hector. Run on Windows box. > > Traffic measurement for clients (on Windows) was done via Resource Monitor > and packet capture via Network Monitor. The overall traffic appears to be > roughly 700KB/sec (kilobytes) for ~10000 increments). > > Traffic measurement for server-to-server was done on DC:2 via packet > capture. This capture specifically included only nodes in other > datacenters (so no internal DC traffic was captured). > > The vast majority of traffic was directed to one node DC:2-1. DC2-2 > received like 1/30 of the traffic. I think I've read somewhere that > Cassandra directs DC-to-DC traffic to one node, so this makes sense. > > What is surprising though -- is the amount of traffic. It looks to be > roughly twice the amount of the total traffic generated by clients, i.e. > something like 1.5MB/sec (megabytes). Note -- this only counts incoming > traffic. > > I've taken a look at some of the captured packets and it looks like > there's much more service information in DC-to-DC traffic compared to > client-to-server traffic -- although I am by no means certain here. > > > Overall I have a couple of questions: > - Is it indeed the case that server-to-server replication traffic can be > significantly more bloated than client-to-server traffic? Or do I need to > review my testing methodology? > - Is there anything that can be done to reduce cross-DC replication > traffic? Perhaps some compression scheme? Or some delay before replication > allowing for possibly more increments to be merged together? > > > Best regards, > Sergey -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-counters-replication-uses-more-traffic-than-client-increments-tp7584412p7584620.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.