Re: Peculiar imbalance affecting 2 machines in a 6 node cluster

aaron morton Wed, 10 Aug 2011 14:16:26 -0700

Cool. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com


On 11 Aug 2011, at 02:45, Mina Naguib wrote:

> 
> Hi Aaron
> 
> Thank you very much for the reply and the pointers to the previous list 
> discussions.  The second was was particularly telling.
> 
> I'm happy to say that the problem is fixed, and it's so trivial it's quite 
> embarrassing - but I'll state it here for the sake of the archives.
> 
> There was an extra semicolon in the topology file in the line defining IPLA3. 
>  It's just as visible in my prod config as it is in my example below ;-)
> 
> I'm guessing the parser splits <dc, rack> tuples on (":"), so it probably 
> parsed the IPLA3 entry as "DCLA" , ":RAC1" (which is different than the 
> others on "RAC1"), and so the NTS did its thing distributing evenly between 
> racks, and IPLA3 got more of the data and IPLA2 got less.
> 
> I''ve fixed it, and the reads/s and writes/s immediately equalized.  I'm now 
> doing a round of repairs/compactions/cleanups to equalize the data load as 
> well.
> 
> Unfortunately It's not easy in cassandra 0.7.8 to actually see the parsed 
> topology state (unlike 0.8's nice ring output which shows the DC and rack), 
> so I'm ashamed to say it took much longer than it should've to troubleshoot.
> 
> Thanks for your help.
> 
> 
> On 2011-08-10, at 5:12 AM, aaron morton wrote:
> 
>> WRT the load imbalance checking the basics: you've run cleanup after any 
>> tokens moves? Repair is running ?  Also sometimes nodes get a bit bloated 
>> from repair and will settle down with compaction. 
>> 
>> Your slightly odd tokens in the MTL DC are making it a little tricky to 
>> understand whats going on. But I'm trying to check if you've followed the 
>> multi DC token selection here  
>> http://wiki.apache.org/cassandra/Operations#Token_selection . Background 
>> about what can happen in a multi dc deployment if the tokens are not right 
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-between-racks-td6324819.html
>> 
>> This is what you currently have….
>> 
>> DC:LA
>> IPLA1           Up     Normal  34.57 GB        11.11%  0                     
>>                       
>> IPLA2           Up     Normal  17.55 GB        11.11%  
>> 56713727820156410577229101238628035242      
>> IPLA3           Up     Normal  51.37 GB        11.11%  
>> 113427455640312821154458202477256070485     
>> 
>> DC: MTL
>> IPMTL1          Up     Normal  34.43 GB        22.22%  
>> 37809151880104273718152734159085356828      
>> IPMTL2          Up     Normal  34.56 GB        22.22%  
>> 94522879700260684295381835397713392071      
>> IPMTL3          Up     Normal  34.71 GB        22.22%  
>> 151236607520417094872610936636341427313   
>> 
>> Using the bump approach you would have 
>> 
>> IPLA1        0     
>> IPLA2        56713727820156410577229101238628035242    
>> IPLA3        113427455640312821154458202477256070484     
>> 
>> IPMTL1       1           
>> IPMTL2       56713727820156410577229101238628035243          
>> IPMTL3       113427455640312821154458202477256070485          
>> 
>> Using the interleaving you would have 
>> 
>> IPLA1        0
>> IPMTL1       28356863910078205288614550619314017621
>> IPLA2        56713727820156410577229101238628035242
>> IPMTL2       85070591730234615865843651857942052863
>> IPLA3        113427455640312821154458202477256070484
>> IPMTL3       141784319550391026443072753096570088105
>> 
>> The current setup in LA give each node in LA 33% of the LA local ring. Which 
>> should be right, just checking.  
>> 
>> If cleanup / repair / compaction is all good and you are confident the 
>> tokens are right try poking around with nodetool getendpoints to see which 
>> nodes keys are sent to.  Like you I cannot see anything obvious in NTS that 
>> would cause load to be imbalanced if they are all in the same rack. 
>> 
>> Cheers
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 10 Aug 2011, at 11:24, Mina Naguib wrote:
>> 
>>> Hi everyone
>>> 
>>> I'm observing a very peculiar type of imbalance and I'd appreciate any help 
>>> or ideas to try.  This is on cassandra 0.7.8.
>>> 
>>> The original cluster was 3 machines in the DCMTL, equally balanced at 
>>> 33.33% each and each holding roughly 34G.
>>> 
>>> Then, I added to it 3 machines in the LA data center.  The ring is 
>>> currently as follows (IP addresses redacted for clarity):
>>> 
>>> Address         Status State   Load            Owns    Token                
>>>                        
>>>                                                       
>>> 151236607520417094872610936636341427313     
>>> IPLA1           Up     Normal  34.57 GB        11.11%  0                    
>>>                        
>>> IPMTL1          Up     Normal  34.43 GB        22.22%  
>>> 37809151880104273718152734159085356828      
>>> IPLA2           Up     Normal  17.55 GB        11.11%  
>>> 56713727820156410577229101238628035242      
>>> IPMTL2          Up     Normal  34.56 GB        22.22%  
>>> 94522879700260684295381835397713392071      
>>> IPLA3           Up     Normal  51.37 GB        11.11%  
>>> 113427455640312821154458202477256070485     
>>> IPMTL3          Up     Normal  34.71 GB        22.22%  
>>> 151236607520417094872610936636341427313     
>>> 
>>> The bump in the 3 MTL nodes (22.22%) is in anticipation of 3 more machines 
>>> in yet another data center, but they're not ready yet to join the cluster.  
>>> Once that third DC joins all nodes will be at 11.11%. However, I don't 
>>> think this is related.
>>> 
>>> The problem I'm currently observing is visible in the LA machines, 
>>> specifically IPLA2 and IPLA3.  IPLA2 has 50% the expected volume, and IPLA3 
>>> has 150% the expected volume.
>>> 
>>> Putting their load side by side shows the peculiar ratio of 2:1:3 between 
>>> the 3 LA nodes:
>>> 34.57 17.55 51.37
>>> (the same 2:1:3 ratio is reflected in our internal tools trending 
>>> reads/second and writes/second)
>>> 
>>> I've tried several iterations of compactions/cleanups to no avail.  In 
>>> terms of config this is the main keyspace:
>>>  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>>>    Options: [DCMTL:2, DCLA:2]
>>> And this is the cassandra-topology.properties file (IPs again redacted for 
>>> clarity):
>>>  IPMTL1:DCMTL:RAC1
>>>  IPMTL2:DCMTL:RAC1
>>>  IPMTL3:DCMTL:RAC1
>>>  IPLA1:DCLA:RAC1
>>>  IPLA2:DCLA:RAC1
>>>  IPLA3:DCLA::RAC1
>>>  IPLON1:DCLON:RAC1
>>>  IPLON2:DCLON:RAC1
>>>  IPLON3:DCLON:RAC1
>>>  # default for unknown nodes
>>>  default=DCBAD:RACBAD
>>> 
>>> 
>>> One thing that did occur to me while reading the source code for the 
>>> NetworkTopologyStrategy's calculateNaturalEndpoints is that it prefers 
>>> placing data on different racks.  Since all my machines are defined as in 
>>> the same rack, I believe that the 2-pass approach would still yield 
>>> balanced placement.
>>> 
>>> However, just to test, I modified live the topology file to specify that 
>>> IPLA1, IPLA2 and IPLA3 are in 3 different racks, and sure enough I saw 
>>> immediately that the reads/second and writes/second equalized to expected 
>>> fair volume (I quickly reverted that change).
>>> 
>>> So, it seems somehow related to rack awareness, but I've been raking my 
>>> head and I can't figure out how/why, or why the three MTL machines are not 
>>> affected the same way.
>>> 
>>> If the solution is to specify them in different racks and run repair on 
>>> everything, I'm okay with that - but I hate doing that without first 
>>> understanding *why* the current behavior is the way it is.
>>> 
>>> Any ideas would be hugely appreciated.
>>> 
>>> Thank you.
>>> 
>> 
>

Re: Peculiar imbalance affecting 2 machines in a 6 node cluster

Reply via email to