Re: Load balancing issue with virtual nodes

Ben Bromhead Mon, 28 Apr 2014 18:42:07 -0700

Some imbalance is expected and considered normal:

See http://wiki.apache.org/cassandra/VirtualNodes/Balance


As well as

https://issues.apache.org/jira/browse/CASSANDRA-7032

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 29 Apr 2014, at 7:30 am, DuyHai Doan <doanduy...@gmail.com> wrote:

> Hello all
> 
>  Some update about the issue.
> 
>  After wiping completely all sstable/commitlog/saved_caches folder and 
> restart the cluster from scratch, we still experience weird figures. After 
> the restart, nodetool status does not show an exact balance of 50% of data 
> for each node :
> 
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address  Load Tokens Owns (effective) Host ID Rack
> UN host1 48.57 KB 256 51.6%  d00de0d1-836f-4658-af64-3a12c00f47d6 rack1
> UN host2 48.57 KB 256 48.4%  e9d2505b-7ba7-414c-8b17-af3bbe79ed9c rack1
> 
> 
> As you can see, the % is very close to 50% but not exactly 50%
> 
>  What can explain that ? Can it be network connection issue during token 
> initial shuffle phase ?
> 
> P.S: both host1 and host2 are supposed to have exactly the same hardware
> 
> Regards
> 
>  Duy Hai DOAN
> 
> 
> On Thu, Apr 24, 2014 at 11:20 PM, Batranut Bogdan <batra...@yahoo.com> wrote:
> I don't know about hector but the datastax java driver needs just one ip from 
> the cluster and it will discover the rest of the nodes. Then by default it 
> will do a round robin when sending requests. So if Hector does the same the 
> patterb will againg appear.
> Did you look at the size of the dirs?
> That documentation is for C* 0.8. It's old. But depending on your boxes you 
> might reach CPU bottleneck. Might want to google for write path in 
> cassandra..  According to that, there is not much to do when writes come 
> in...  
> On Friday, April 25, 2014 12:00 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
> I did some experiments.
> 
>  Let's say we have node1 and node2
> 
> First, I configured Hector with node1 & node2 as hosts and I saw that only 
> node1 has high CPU load
> 
> To eliminate the "client connection" issue, I re-test with only node2 
> provided as host for Hector. Same pattern. CPU load is above 50% on node1 and 
> below 10% on node2.
> 
> It means that node2 is playing as coordinator and forward many write/read 
> request to node1
> 
>  Why did I look at CPU load and not iostat & al ?
> 
>  Because I have a very intensive write work load with read-only-once pattern. 
> I've read here 
> (http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning) that 
> heavy write in C* is more CPU bound but maybe the info may be outdated and no 
> longer true
> 
>  Regards
> 
>  Duy Hai DOAN
> 
> 
> On Thu, Apr 24, 2014 at 10:00 PM, Michael Shuler <mich...@pbandjelly.org> 
> wrote:
> On 04/24/2014 10:29 AM, DuyHai Doan wrote:
>   Client used = Hector 1.1-4
>   Default Load Balancing connection policy
>   Both nodes addresses are provided to Hector so according to its
> connection policy, the client should switch alternatively between both nodes
> 
> OK, so is only one connection being established to one node for one bulk 
> write operation? Or are multiple connections being made to both nodes and 
> writes performed on both?
> 
> -- 
> Michael
> 
> 
> 
>

Re: Load balancing issue with virtual nodes

Reply via email to