On Wed, Feb 6, 2013 at 2:02 PM, <stephen.m.thomp...@wellsfargo.com> wrote: > Thanks Aaron. I ran the cassandra-shuffle job and did a rebuild and compact > on each of the nodes. > > > > [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status > > Datacenter: 28 > > ============== > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens Owns (effective) Host ID > Rack > > UN 10.28.205.125 1.7 GB 255 33.7% > 3daab184-61f0-49a0-b076-863f10bc8c6c 205 > > UN 10.28.205.126 591.44 MB 256 99.9% > 55bbd4b1-8036-4e32-b975-c073a7f0f47f 205 > > UN 10.28.205.127 112.28 MB 257 66.4% > d240c91f-4901-40ad-bd66-d374a0ccf0b9 205
Sorry, I have to ask, Is this the complete output? Have you perhaps sanitized it in some way? It seems like there is some piece of missing context here. Can you tell us: * Is this a cluster that was upgraded to virtual nodes (that would include a 1.2.x cluster initialized with one token per node, and num_tokens set after the fact). If so, what did the initial token map look like? * Was initial_token used at any point along the way (either to supply a single token, or csv list of them), on any or all of the nodes in this cluster, at any time? * One data center (28)? One rack (205)? Three nodes? * How many keyspaces, and what are the replication strategies? * What does the full output of `nodetool ring' look like now? Can you attach it? > So this is a little better. At last node 3 has some content, but they are > still far from balanced. If I am understand this correctly, this is the > distribution I would expect if the tokens were set at 15/5/1 rather than > equal. As configured, I would expect roughly equal amounts of data on each > node. Is that right? Do you have any suggestions for what I can look at to > get there? Shuffle should only be required if you started out with 1-token-per-node. In that case, your existing ranges are evenly divided num_tokens ways, and so should be exceptionally consistent with one another (assuming of course that the existing ranges were evenly sized). The shuffle op merely "shuffles" the ranges you have to (random )other nodes in the cluster. If this cluster were started from scratch with num_tokens = 256, then a total of 768 tokens would have been randomly generated from within the murmur3 hash-space. Random assignment isn't perfect, but with 768 tokens (256 per), it should work out to be reasonably close on average. TL;DR What Aaron Said(tm) In the absence of rack/dc aware replication, your allocation is suspicious. > I have about 11M rows of data in this keyspace and none of them are > exceptionally long … it’s data pulled from Oracle and didn’t include any > BLOB, etc. [ ... ] > From: aaron morton [mailto:aa...@thelastpickle.com] > Sent: Tuesday, February 05, 2013 3:41 PM > To: user@cassandra.apache.org > Subject: Re: unbalanced ring > > > > Use nodetool status with vnodes > http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes > > > > The different load can be caused by rack affinity, are all the nodes in the > same rack ? Another simple check is have you created some very big rows? > On 6/02/2013, at 8:40 AM, stephen.m.thomp...@wellsfargo.com wrote: > > > > So I have three nodes in a ring in one data center. My configuration has > num_tokens: 256 set andinitial_token commented out. When I look at the > ring, it shows me all of the token ranges of course, and basically identical > data for each range on each node. Here is the Cliff’s Notes version of what > I see: > > > > [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring > > > > Datacenter: 28 > > ========== > > Replicas: 1 > > > > Address Rack Status State Load Owns > Token > > > 9187343239835811839 > > 10.28.205.125 205 Up Normal 2.85 GB 33.69% > -3026347817059713363 > > 10.28.205.125 205 Up Normal 2.85 GB 33.69% > -3026276684526453414 > > 10.28.205.125 205 Up Normal 2.85 GB 33.69% > -3026205551993193465 > > (etc) > > 10.28.205.126 205 Up Normal 1.15 GB 100.00% > -9187343239835811840 > > 10.28.205.126 205 Up Normal 1.15 GB 100.00% > -9151314442816847872 > > 10.28.205.126 205 Up Normal 1.15 GB 100.00% > -9115285645797883904 > > (etc) > > 10.28.205.127 205 Up Normal 69.13 KB 66.30% > -9223372036854775808 > > 10.28.205.127 205 Up Normal 69.13 KB 66.30% > 36028797018963967 > > 10.28.205.127 205 Up Normal 69.13 KB 66.30% > 72057594037927935 > > (etc) > > > > So at this point I have a number of questions. The biggest question is of > Load. Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has > only 0.000069 GB? These boxes are all comparable and all configured > identically. > > > > partitioner: org.apache.cassandra.dht.Murmur3Partitioner > > > > I’m sorry to ask so many questions – I’m having a hard time finding > documentation that explains this stuff. -- Eric Evans Acunu | http://www.acunu.com | @acunu