Re: Replicate On Write behavior

Sylvain Lebresne Fri, 09 Sep 2011 00:18:45 -0700

We'll solve #2890 and we should have done it sooner.

That being said, a quick question: how do you do your inserts from the
clients ? Are you evenly
distributing the inserts among the nodes ? Or are you always hitting
the same coordinator ?


Because provided the nodes are correctly distributed on the ring, if
you distribute the inserts
(increment) requests across the nodes (again I'm talking of client
requests), you "should" not
see the behavior you observe.

--
Sylvain

On Thu, Sep 8, 2011 at 9:48 PM, David Hawthorne <dha...@gmx.3crowd.com> wrote:
> It was exactly due to 2890, and the fact that the first replica is always the 
> one with the lowest value IP address.  I patched cassandra to pick a random 
> node out of the replica set in StorageProxy.java findSuitableEndpoint:
>
> Random rng = new Random();
>
> return endpoints.get(rng.nextInt(endpoints.size()));  // instead of return 
> endpoints.get(0);
>
> Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the 
> inserts/sec throughput.
>
> Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite 
> load of a counter insert:
>
> One node will get RF/n of the disk work.  Two nodes will always get 0 disk 
> work.
>
> in a 3 node cluster, 1 node gets disk hit really hard.  You get the 
> performance of a one-node cluster.
> in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you 
> the performance of ~2 node cluster.
> in a 10 node cluster, 1 node gets 30% of the disk work, giving you the 
> performance of a ~3 node cluster.
>
> I confirmed this behavior with a 3, 4, and 5 node cluster size.
>
>
>>
>>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
>>> ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that 
>>> normal?  I'm using RandomPartitioner...
>>>
>>> Address         DC          Rack        Status State   Load            Owns 
>>>    Token
>>>                                                                            
>>> 136112946768375385385349842972707284580
>>> 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  >>> 0
>>> 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  
>>> 34028236692093846346337460743176821145
>>> 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  
>>> 68056473384187692692674921486353642290
>>> 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  
>>> 102084710076281539039012382229530463435
>>> 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  
>>> 136112946768375385385349842972707284580
>>>
>>> The nodes with ReplicateOnWrites are the 3 in the middle.  The first node 
>>> and last node both have a count of 0.  This is a clean cluster, and I've 
>>> been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 
>>> hours.  The last time this test ran, it went all the way down to 500 
>>> inserts/sec before I killed it.
>>
>> Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.
>>
>> --
>> Sylvain
>
>

Re: Replicate On Write behavior

Reply via email to