How can I correct this Cassandra load imbalance?

ian douglas Mon, 03 Jan 2011 13:31:13 -0800

Hi everyone,

I've been lurking in the #cassandra IRC channel lately looking for helpon this, but wanted to try the mailing list as well.

We have 3 nodes, and last week it was suggested that I run 'nodetoolmove' to reset our token values on the 3 nodes because they wererandomly assigned when starting the nodes the first time.


Our ReplicationFactor is set to 2, and we're using the Random partitioner.

Our row min/max/mean values are mostly the same, but have some smalldiscrepancies, but for one of our column families, we have a prettyvaried disk space live/total count. We'd expect that after issuing a"nodetool move" and "nodetool cleanup" on each node (and waiting foreach to complete before repeating on the next node) that the data loadwould be more balanced, and give us approximately the same disk load onall 3 nodes.


nodetool ring:

Address Status LoadRange Ring11342745564031282115445820247725607048510.242.142.16 Up 2.66 GB0 |<--|10.242.142.111Up 1.24 GB56713727820156410577229101238628035243 | |10.242.6.242 Up 5.86 GB113427455640312821154458202477256070485 |-->|

Disk space used (live/total) and row size min/max/mean for each of thenodes for the largest ColumnFamily of about a dozen CF's (all aredefined on each node. I added commas for readability:


Node 1:
        Column Family: UserGameshareData
        Space used (live): 1,992,054,724
        Space used (total): 1,992,054,724
        Compacted row minimum size: 307
        Compacted row maximum size: 123,498
        Compacted row mean size: 1,409

Node 2:
        Column Family: UserGameshareData
        Space used (live): 782,974,992
        Space used (total): 806,168,719
        Compacted row minimum size: 306
        Compacted row maximum size: 88,379
        Compacted row mean size: 1,405

Node 3:
        Column Family: UserGameshareData
        Space used (live): 4,435,011,932
        Space used (total): 4,435,011,932
        Compacted row minimum size: 306
        Compacted row maximum size: 68,636
        Compacted row mean size: 1,387

We also have a cron job running every night that runs a 'nodetoolcleanup' on one of the nodes (day of year % number of nodes).

I'm happy to share any additional information that can help us balancethings a little bit better. I had debated setting our ReplicationFactorto 3 to see if our nodes eventually copied data to every node.


Cheers,
Ian Douglas, Armor Games

How can I correct this Cassandra load imbalance?

Reply via email to