Are all of your machines equal hardware?  Since those machines are sending data 
somewhere, maybe they are behind in replicating and are continuously catching 

Use a tool like tcpdump to find out where the data is going

From: Philippe <<>>
Reply-To: "<>" 
Date: Tue, 15 Nov 2011 13:22:38 -0800
To: user <<>>
Subject: Re: Network traffic patterns

Sorry about the previous message, I've enabled keyboard shortcuts on 

I'm trying to understand the network usage I am seeing in my cluster, can 
anyone shed some light?
It's an RF=3, 12-node, cassandra 0.8.6 cluster. repair is performed on each 
node once a week, with a rolling schedule.
The nodes are p13,p14,p15...p24 and are consecutive in that order on the ring. 
Each node is only a cassandra database. I am hitting the cluster from another 
server (p4).

p4 is doing this with 20 threads in parallel

 1.  read a lot of data (some columns for hundreds to tens of thousands of 
keys, split into 512-key multigets)
 2.  process the data
 3.  write back a byte array to cassandra (average size is 400 bytes)
 4.  go back to 1

According to my munin graphs, network usage is about as follows. I am not 
surprised at the bias towards p13-p15 as p4 is getting & storing data mainly 
for keys located on one of those nodes.

 *   p4 : 1.5Mb/s in and out
 *   p13-p15 : 15Mb/s in and 80Mb/s out
 *   p16-p24 : 45Mb/s in and 5Mb/s out

What I don't understand is why p4 is only seeing 1.5Mb/s while I see 80Mb/s on 
p13 & p15.

The way I understand this:

 *   p4 makes a multiget to the cluster, electing to use any node in the 
cluster (IN traffic for describe the query)
 *   coordinator node replays the query on all 3 replicas (so 3 servers each 
get the IN traffic, mostly p13-p15)
 *   each server replies to coordinator
 *   coordinator chooses matching values and sends back data to p4

So if p13-p15 are outputting 80Mb/s why am I not seeing 80Mb/s coming into p4 
which is on the receiving end ?


2011/11/15 Philippe <<>>
I'm trying to understand the network usage I am seeing in my cluster, can 
anyone shed some light?
It's an RF=3, 12-node, cassandra 0.8.6 cluster. The nodes are p13,p14,p15...p24 
and are consecutive in that order on the ring.
Each node is only a cassandra database. I am hitting the cluster from another 
server (p4).

The pattern on p4 is the pattern is to

 1.  read a lot of data (some columns for hundreds to tens of thousands of 
keys, split into 512-key multigets)
 2.  process the data
 3.  write back a byte array to cassandra (average size is 400 bytes)

p4 reads as

Reply via email to