Hi everyone,

I've been lurking in the #cassandra IRC channel lately looking for help on this, but wanted to try the mailing list as well.

We have 3 nodes, and last week it was suggested that I run 'nodetool move' to reset our token values on the 3 nodes because they were randomly assigned when starting the nodes the first time.

Our ReplicationFactor is set to 2, and we're using the Random partitioner.

Our row min/max/mean values are mostly the same, but have some small discrepancies, but for one of our column families, we have a pretty varied disk space live/total count. We'd expect that after issuing a "nodetool move" and "nodetool cleanup" on each node (and waiting for each to complete before repeating on the next node) that the data load would be more balanced, and give us approximately the same disk load on all 3 nodes.

nodetool ring:
Address Status Load Range Ring 113427455640312821154458202477256070485 10.242.142.16 Up 2.66 GB 0 |<--| 10.242.142.111Up 1.24 GB 56713727820156410577229101238628035243 | | 10.242.6.242 Up 5.86 GB 113427455640312821154458202477256070485 |-->|

Disk space used (live/total) and row size min/max/mean for each of the nodes for the largest ColumnFamily of about a dozen CF's (all are defined on each node. I added commas for readability:

Node 1:
        Column Family: UserGameshareData
        Space used (live): 1,992,054,724
        Space used (total): 1,992,054,724
        Compacted row minimum size: 307
        Compacted row maximum size: 123,498
        Compacted row mean size: 1,409

Node 2:
        Column Family: UserGameshareData
        Space used (live): 782,974,992
        Space used (total): 806,168,719
        Compacted row minimum size: 306
        Compacted row maximum size: 88,379
        Compacted row mean size: 1,405

Node 3:
        Column Family: UserGameshareData
        Space used (live): 4,435,011,932
        Space used (total): 4,435,011,932
        Compacted row minimum size: 306
        Compacted row maximum size: 68,636
        Compacted row mean size: 1,387


We also have a cron job running every night that runs a 'nodetool cleanup' on one of the nodes (day of year % number of nodes).

I'm happy to share any additional information that can help us balance things a little bit better. I had debated setting our ReplicationFactor to 3 to see if our nodes eventually copied data to every node.

Cheers,
Ian Douglas, Armor Games

Reply via email to