I'm using BOP. Le 20 nov. 2011 13:09, "Boris Yen" <yulin...@gmail.com> a écrit :
> I am just curious about which partitioner you are using? > > On Thu, Nov 17, 2011 at 4:30 PM, Philippe <watche...@gmail.com> wrote: > >> Hi Todd >> Yes all equal hardware. Nearly no CPU usage and no memory issues. >> Repairs are running in tens of minutes so i don't understand why >> replication would be backed up. >> >> Any other ideas? >> Le 17 nov. 2011 02:33, "Todd Burruss" <bburr...@expedia.com> a écrit : >> >> Are all of your machines equal hardware? Since those machines are >>> sending data somewhere, maybe they are behind in replicating and are >>> continuously catching up? >>> >>> Use a tool like tcpdump to find out where the data is going >>> >>> From: Philippe <watche...@gmail.com> >>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Date: Tue, 15 Nov 2011 13:22:38 -0800 >>> To: user <user@cassandra.apache.org> >>> Subject: Re: Network traffic patterns >>> >>> Sorry about the previous message, I've enabled keyboard shortcuts on >>> gmail...*sigh*... >>> >>> Hello, >>> I'm trying to understand the network usage I am seeing in my cluster, >>> can anyone shed some light? >>> It's an RF=3, 12-node, cassandra 0.8.6 cluster. repair is performed on >>> each node once a week, with a rolling schedule. >>> The nodes are p13,p14,p15...p24 and are consecutive in that order on the >>> ring. Each node is only a cassandra database. I am hitting the cluster from >>> another server (p4). >>> >>> p4 is doing this with 20 threads in parallel >>> >>> 1. read a lot of data (some columns for hundreds to tens of >>> thousands of keys, split into 512-key multigets) >>> 2. process the data >>> 3. write back a byte array to cassandra (average size is 400 bytes) >>> 4. go back to 1 >>> >>> According to my munin graphs, network usage is about as follows. I am >>> not surprised at the bias towards p13-p15 as p4 is getting & storing data >>> mainly for keys located on one of those nodes. >>> >>> - p4 : 1.5Mb/s in and out >>> - p13-p15 : 15Mb/s in and 80Mb/s out >>> - p16-p24 : 45Mb/s in and 5Mb/s out >>> >>> What I don't understand is why p4 is only seeing 1.5Mb/s while I see >>> 80Mb/s on p13 & p15. >>> >>> The way I understand this: >>> >>> - p4 makes a multiget to the cluster, electing to use any node in >>> the cluster (IN traffic for describe the query) >>> - coordinator node replays the query on all 3 replicas (so 3 servers >>> each get the IN traffic, mostly p13-p15) >>> - each server replies to coordinator >>> - coordinator chooses matching values and sends back data to p4 >>> >>> So if p13-p15 are outputting 80Mb/s why am I not seeing 80Mb/s coming >>> into p4 which is on the receiving end ? >>> >>> Thanks >>> >>> 2011/11/15 Philippe <watche...@gmail.com> >>> >>>> Hello, >>>> I'm trying to understand the network usage I am seeing in my cluster, >>>> can anyone shed some light? >>>> It's an RF=3, 12-node, cassandra 0.8.6 cluster. The nodes are >>>> p13,p14,p15...p24 and are consecutive in that order on the ring. >>>> Each node is only a cassandra database. I am hitting the cluster from >>>> another server (p4). >>>> >>>> The pattern on p4 is the pattern is to >>>> >>>> 1. read a lot of data (some columns for hundreds to tens of >>>> thousands of keys, split into 512-key multigets) >>>> 2. process the data >>>> 3. write back a byte array to cassandra (average size is 400 bytes) >>>> >>>> >>>> p4 reads as >>>> >>> >>> >