Re: Peculiar imbalance affecting 2 machines in a 6 node cluster

aaron morton Wed, 10 Aug 2011 02:13:36 -0700

WRT the load imbalance checking the basics: you've run cleanup after any tokens 
moves? Repair is running ?  Also sometimes nodes get a bit bloated from repair 
and will settle down with compaction.


Your slightly odd tokens in the MTL DC are making it a little tricky to 
understand whats going on. But I'm trying to check if you've followed the multi 
DC token selection here  
http://wiki.apache.org/cassandra/Operations#Token_selection . Background about 
what can happen in a multi dc deployment if the tokens are not right 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-between-racks-td6324819.html

This is what you currently have….

DC:LA
IPLA1           Up     Normal  34.57 GB        11.11%  0                        
                   
IPLA2           Up     Normal  17.55 GB        11.11%  
56713727820156410577229101238628035242      
IPLA3           Up     Normal  51.37 GB        11.11%  
113427455640312821154458202477256070485     

DC: MTL
IPMTL1          Up     Normal  34.43 GB        22.22%  
37809151880104273718152734159085356828      
IPMTL2          Up     Normal  34.56 GB        22.22%  
94522879700260684295381835397713392071      
IPMTL3          Up     Normal  34.71 GB        22.22%  
151236607520417094872610936636341427313   

Using the bump approach you would have 

IPLA1   0     
IPLA2           56713727820156410577229101238628035242    
IPLA3   113427455640312821154458202477256070484     

IPMTL1  1           
IPMTL2  56713727820156410577229101238628035243          
IPMTL3  113427455640312821154458202477256070485          

Using the interleaving you would have 

IPLA1   0
IPMTL1  28356863910078205288614550619314017621
IPLA2   56713727820156410577229101238628035242
IPMTL2  85070591730234615865843651857942052863
IPLA3   113427455640312821154458202477256070484
IPMTL3  141784319550391026443072753096570088105

The current setup in LA give each node in LA 33% of the LA local ring. Which 
should be right, just checking.  

If cleanup / repair / compaction is all good and you are confident the tokens 
are right try poking around with nodetool getendpoints to see which nodes keys 
are sent to.  Like you I cannot see anything obvious in NTS that would cause 
load to be imbalanced if they are all in the same rack. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Aug 2011, at 11:24, Mina Naguib wrote:

> Hi everyone
> 
> I'm observing a very peculiar type of imbalance and I'd appreciate any help 
> or ideas to try.  This is on cassandra 0.7.8.
> 
> The original cluster was 3 machines in the DCMTL, equally balanced at 33.33% 
> each and each holding roughly 34G.
> 
> Then, I added to it 3 machines in the LA data center.  The ring is currently 
> as follows (IP addresses redacted for clarity):
> 
> Address         Status State   Load            Owns    Token                  
>                      
>                                                       
> 151236607520417094872610936636341427313     
> IPLA1           Up     Normal  34.57 GB        11.11%  0                      
>                      
> IPMTL1          Up     Normal  34.43 GB        22.22%  
> 37809151880104273718152734159085356828      
> IPLA2           Up     Normal  17.55 GB        11.11%  
> 56713727820156410577229101238628035242      
> IPMTL2          Up     Normal  34.56 GB        22.22%  
> 94522879700260684295381835397713392071      
> IPLA3           Up     Normal  51.37 GB        11.11%  
> 113427455640312821154458202477256070485     
> IPMTL3          Up     Normal  34.71 GB        22.22%  
> 151236607520417094872610936636341427313     
> 
> The bump in the 3 MTL nodes (22.22%) is in anticipation of 3 more machines in 
> yet another data center, but they're not ready yet to join the cluster.  Once 
> that third DC joins all nodes will be at 11.11%. However, I don't think this 
> is related.
> 
> The problem I'm currently observing is visible in the LA machines, 
> specifically IPLA2 and IPLA3.  IPLA2 has 50% the expected volume, and IPLA3 
> has 150% the expected volume.
> 
> Putting their load side by side shows the peculiar ratio of 2:1:3 between the 
> 3 LA nodes:
> 34.57 17.55 51.37
> (the same 2:1:3 ratio is reflected in our internal tools trending 
> reads/second and writes/second)
> 
> I've tried several iterations of compactions/cleanups to no avail.  In terms 
> of config this is the main keyspace:
>  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>    Options: [DCMTL:2, DCLA:2]
> And this is the cassandra-topology.properties file (IPs again redacted for 
> clarity):
>  IPMTL1:DCMTL:RAC1
>  IPMTL2:DCMTL:RAC1
>  IPMTL3:DCMTL:RAC1
>  IPLA1:DCLA:RAC1
>  IPLA2:DCLA:RAC1
>  IPLA3:DCLA::RAC1
>  IPLON1:DCLON:RAC1
>  IPLON2:DCLON:RAC1
>  IPLON3:DCLON:RAC1
>  # default for unknown nodes
>  default=DCBAD:RACBAD
> 
> 
> One thing that did occur to me while reading the source code for the 
> NetworkTopologyStrategy's calculateNaturalEndpoints is that it prefers 
> placing data on different racks.  Since all my machines are defined as in the 
> same rack, I believe that the 2-pass approach would still yield balanced 
> placement.
> 
> However, just to test, I modified live the topology file to specify that 
> IPLA1, IPLA2 and IPLA3 are in 3 different racks, and sure enough I saw 
> immediately that the reads/second and writes/second equalized to expected 
> fair volume (I quickly reverted that change).
> 
> So, it seems somehow related to rack awareness, but I've been raking my head 
> and I can't figure out how/why, or why the three MTL machines are not 
> affected the same way.
> 
> If the solution is to specify them in different racks and run repair on 
> everything, I'm okay with that - but I hate doing that without first 
> understanding *why* the current behavior is the way it is.
> 
> Any ideas would be hugely appreciated.
> 
> Thank you.
>

Re: Peculiar imbalance affecting 2 machines in a 6 node cluster

Reply via email to