Hi Todd Yes all equal hardware. Nearly no CPU usage and no memory issues. Repairs are running in tens of minutes so i don't understand why replication would be backed up.
Any other ideas? Le 17 nov. 2011 02:33, "Todd Burruss" <bburr...@expedia.com> a écrit : > Are all of your machines equal hardware? Since those machines are sending > data somewhere, maybe they are behind in replicating and are continuously > catching up? > > Use a tool like tcpdump to find out where the data is going > > From: Philippe <watche...@gmail.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Tue, 15 Nov 2011 13:22:38 -0800 > To: user <user@cassandra.apache.org> > Subject: Re: Network traffic patterns > > Sorry about the previous message, I've enabled keyboard shortcuts on > gmail...*sigh*... > > Hello, > I'm trying to understand the network usage I am seeing in my cluster, can > anyone shed some light? > It's an RF=3, 12-node, cassandra 0.8.6 cluster. repair is performed on > each node once a week, with a rolling schedule. > The nodes are p13,p14,p15...p24 and are consecutive in that order on the > ring. Each node is only a cassandra database. I am hitting the cluster from > another server (p4). > > p4 is doing this with 20 threads in parallel > > 1. read a lot of data (some columns for hundreds to tens of thousands > of keys, split into 512-key multigets) > 2. process the data > 3. write back a byte array to cassandra (average size is 400 bytes) > 4. go back to 1 > > According to my munin graphs, network usage is about as follows. I am not > surprised at the bias towards p13-p15 as p4 is getting & storing data > mainly for keys located on one of those nodes. > > - p4 : 1.5Mb/s in and out > - p13-p15 : 15Mb/s in and 80Mb/s out > - p16-p24 : 45Mb/s in and 5Mb/s out > > What I don't understand is why p4 is only seeing 1.5Mb/s while I see > 80Mb/s on p13 & p15. > > The way I understand this: > > - p4 makes a multiget to the cluster, electing to use any node in the > cluster (IN traffic for describe the query) > - coordinator node replays the query on all 3 replicas (so 3 servers > each get the IN traffic, mostly p13-p15) > - each server replies to coordinator > - coordinator chooses matching values and sends back data to p4 > > So if p13-p15 are outputting 80Mb/s why am I not seeing 80Mb/s coming into > p4 which is on the receiving end ? > > Thanks > > 2011/11/15 Philippe <watche...@gmail.com> > >> Hello, >> I'm trying to understand the network usage I am seeing in my cluster, can >> anyone shed some light? >> It's an RF=3, 12-node, cassandra 0.8.6 cluster. The nodes are >> p13,p14,p15...p24 and are consecutive in that order on the ring. >> Each node is only a cassandra database. I am hitting the cluster from >> another server (p4). >> >> The pattern on p4 is the pattern is to >> >> 1. read a lot of data (some columns for hundreds to tens of thousands >> of keys, split into 512-key multigets) >> 2. process the data >> 3. write back a byte array to cassandra (average size is 400 bytes) >> >> >> p4 reads as >> > >