I think ordered partitioner might cause most of the data to be saved only
on a few nodes. This could contribute to what you saw. Try to use random
partitioner if possible.

On Mon, Nov 21, 2011 at 6:53 AM, Philippe <watche...@gmail.com> wrote:

> I'm using BOP.
> Le 20 nov. 2011 13:09, "Boris Yen" <yulin...@gmail.com> a écrit :
>
> I am just curious about which partitioner you are using?
>>
>> On Thu, Nov 17, 2011 at 4:30 PM, Philippe <watche...@gmail.com> wrote:
>>
>>> Hi Todd
>>> Yes all equal hardware. Nearly no CPU usage and no memory issues.
>>> Repairs are running in tens of minutes so i don't understand why
>>> replication would be backed up.
>>>
>>> Any other ideas?
>>> Le 17 nov. 2011 02:33, "Todd Burruss" <bburr...@expedia.com> a écrit :
>>>
>>> Are all of your machines equal hardware?  Since those machines are
>>>> sending data somewhere, maybe they are behind in replicating and are
>>>> continuously catching up?
>>>>
>>>> Use a tool like tcpdump to find out where the data is going
>>>>
>>>> From: Philippe <watche...@gmail.com>
>>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> Date: Tue, 15 Nov 2011 13:22:38 -0800
>>>> To: user <user@cassandra.apache.org>
>>>> Subject: Re: Network traffic patterns
>>>>
>>>> Sorry about the previous message, I've enabled keyboard shortcuts on
>>>> gmail...*sigh*...
>>>>
>>>> Hello,
>>>> I'm trying to understand the network usage I am seeing in my cluster,
>>>> can anyone shed some light?
>>>> It's an RF=3, 12-node, cassandra 0.8.6 cluster. repair is performed on
>>>> each node once a week, with a rolling schedule.
>>>> The nodes are p13,p14,p15...p24 and are consecutive in that order on
>>>> the ring. Each node is only a cassandra database. I am hitting the cluster
>>>> from another server (p4).
>>>>
>>>> p4 is doing this with 20 threads in parallel
>>>>
>>>>    1. read a lot of data (some columns for hundreds to tens of
>>>>    thousands of keys, split into 512-key multigets)
>>>>    2. process the data
>>>>    3. write back a byte array to cassandra (average size is 400 bytes)
>>>>    4. go back to 1
>>>>
>>>> According to my munin graphs, network usage is about as follows. I am
>>>> not surprised at the bias towards p13-p15 as p4 is getting & storing data
>>>> mainly for keys located on one of those nodes.
>>>>
>>>>    - p4 : 1.5Mb/s in and out
>>>>    - p13-p15 : 15Mb/s in and 80Mb/s out
>>>>    - p16-p24 : 45Mb/s in and 5Mb/s out
>>>>
>>>> What I don't understand is why p4 is only seeing 1.5Mb/s while I see
>>>> 80Mb/s on p13 & p15.
>>>>
>>>> The way I understand this:
>>>>
>>>>    - p4 makes a multiget to the cluster, electing to use any node in
>>>>    the cluster (IN traffic for describe the query)
>>>>    - coordinator node replays the query on all 3 replicas (so 3
>>>>    servers each get the IN traffic, mostly p13-p15)
>>>>    - each server replies to coordinator
>>>>    - coordinator chooses matching values and sends back data to p4
>>>>
>>>> So if p13-p15 are outputting 80Mb/s why am I not seeing 80Mb/s coming
>>>> into p4 which is on the receiving end ?
>>>>
>>>> Thanks
>>>>
>>>> 2011/11/15 Philippe <watche...@gmail.com>
>>>>
>>>>> Hello,
>>>>> I'm trying to understand the network usage I am seeing in my cluster,
>>>>> can anyone shed some light?
>>>>> It's an RF=3, 12-node, cassandra 0.8.6 cluster. The nodes are
>>>>> p13,p14,p15...p24 and are consecutive in that order on the ring.
>>>>> Each node is only a cassandra database. I am hitting the cluster from
>>>>> another server (p4).
>>>>>
>>>>> The pattern on p4 is the pattern is to
>>>>>
>>>>>    1. read a lot of data (some columns for hundreds to tens of
>>>>>    thousands of keys, split into 512-key multigets)
>>>>>    2. process the data
>>>>>    3. write back a byte array to cassandra (average size is 400 bytes)
>>>>>
>>>>>
>>>>> p4 reads as
>>>>>
>>>>
>>>>
>>

Reply via email to