Hi everyone

I've been struggling trying to get the data volume ("load") to equalize across 
a balanced cluster, and I'm not sure what else I can try.

Background: This was originally a 5-node cluster.  We re-balanced the 3 faster 
machines across the ring, and decommissioned the 2 older ones.  We also 
upgraded cassandra a few times from 0.7.4 through 0.7.5, 0.7.6-2 to 0.7.7.  The 
ring currently looks like so:

Address         Status State   Load            Owns    Token                    
                   
                                                       
151236607520417094872610936636341427313     
xx.xx.x.105     Up     Normal  41.98 GB        33.33%  
37809151880104273718152734159085356828      
xx.xx.x.107     Up     Normal  59.4 GB         33.33%  
94522879700260684295381835397713392071      
xx.xx.x.18      Up     Normal  74.65 GB        33.33%  
151236607520417094872610936636341427313     

What I've tried to far:
        1. Running repair on each node (sequentially of course).
        2. Running cleanup on the largest node (.18) hoping it would shed 
unneeded data

The repairs helped a bit by, slightly, bumping up the load of the first 2 
machines, but the cleanup on the 3rd failed to reduce its data volume.

So, at this point, I'm out of ideas.  In terms of tpstats metrics, each of the 
3 nodes is serving roughly the same volume of ReadStage and MutationStage, so 
they're balanced in that respect.  However I'm concerned about the imbalance of 
the data load ( 24% / 34% / 42% ) and being unable to equalize it.

For the record, there's only 1 keyspace of meaningful data in the cluster, with 
the following schema settings:
Keyspace: ZZZZZZ:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
    Options: [DCMTL:2]
  Column Families:
    ColumnFamily: AAAAAAAAAA
      default_validation_class: org.apache.cassandra.db.marshal.UTF8Type
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period in seconds: 256000.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.88125/1440/188 (millions of ops/minutes/MB)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.1
      Built indexes: []
    ColumnFamily: BBBBB (Super)
      default_validation_class: org.apache.cassandra.db.marshal.UTF8Type
      Columns sorted by: 
org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period in seconds: 75000.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.88125/1440/188 (millions of ops/minutes/MB)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.25
      Built indexes: []

Any tips or ideas to help get the nodes' load equalized would be highly 
appreciated.  If this is normal behaviour and I shouldn't be trying too hard to 
get it equalized, I'd appreciate any notes/links explaining why.

Thank you.

Reply via email to