Hi Enrico, This is a classic chicken and egg problem with the allocate_tokens_for_keyspace setting.
The allocate_tokens_for_keyspace setting uses the replication factor of a DC keyspace to calculate the token allocation when a node is added to the cluster for the first time. Nodes need to be added to the new DC before we can replicate the keyspace over to it. Herein lies the problem. We are unable to use allocate_tokens_for_keyspace unless the keyspace is replicated to the new DC. In addition, as soon as you change the keyspace replication to the new DC, new data will start to be written to it. To work around this issue you will need to do the following. 1. Decommission all the nodes in the *dcNew*, one at a time. 2. Once all the *dcNew* nodes are decommissioned, wipe the contents in the *commitlog*, *data*, *saved_caches*, and *hints* directories of these nodes. 3. Make the first node to add into the *dcNew* a seed node. Set the seed list of the first node with its IP address and the IP addresses of the other seed nodes in the cluster. 4. Set the *initial_token* setting for the first node. You can calculate the values using the algorithm in my blog post: https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html. For convenience I have calculated them: *-9223372036854775808,-4611686018427387904,0,4611686018427387904*. Note, remove the *allocate_tokens_for_keyspace* setting from the *cassandra.yaml* file for this (seed) node. 5. Check to make sure that no other node in the cluster is assigned any of the four tokens specified above. If there is another node in the cluster that is assigned one of the above tokens, increment the conflicting token by values of one until no other node in the cluster is assigned that token value. The idea is to make sure that these four tokens are unique to the node. 6. Add the seed node to cluster. Make sure it is listed in *dcNew *by checking nodetool status. 7. Create a dummy keyspace in *dcNew* that has a replication factor of 2. 8. Set the *allocate_tokens_for_keyspace* value to be the name of the dummy keyspace for the other two nodes you want to add to *dcNew*. Note remove the *initial_token* setting for these other nodes. 9. Set *auto_bootstrap* to *false* for the other two nodes you want to add to *dcNew*. 10. Add the other two nodes to the cluster, one at a time. 11. If you are happy with the distribution, copy the data to *dcNew* by running a rebuild. Hope this helps. Regards, Anthony On Fri, 29 Nov 2019 at 02:08, Enrico Cavallin <cavallin.enr...@gmail.com> wrote: > Hi all, > I have an old datacenter with 4 nodes and 256 tokens each. > I am now starting a new datacenter with 3 nodes and num_token=4 > and allocate_tokens_for_keyspace=myBiggestKeyspace in each node. > Both DCs run Cassandra 3.11.x. > > myBiggestKeyspace has RF=3 in dcOld and RF=2 in dcNew. Now dcNew is very > unbalanced. > Also keyspaces with RF=2 in both DCs have the same problem. > Did I miss something or even with allocate_tokens_for_keyspace I have > strong limitations with low num_token? > Any suggestions on how to mitigate it? > > # nodetool status myBiggestKeyspace > Datacenter: dcOld > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN x.x.x.x 515.83 GiB 256 76.2% > fc462eb2-752f-4d26-aae3-84cb9c977b8a rack1 > UN x.x.x.x 504.09 GiB 256 72.7% > d7af8685-ba95-4854-a220-bc52dc242e9c rack1 > UN x.x.x.x 507.50 GiB 256 74.6% > b3a4d3d1-e87d-468b-a7d9-3c104e219536 rack1 > UN x.x.x.x 490.81 GiB 256 76.5% > 41e80c5b-e4e3-46f6-a16f-c784c0132dbc rack1 > > Datacenter: dcNew > ============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN x.x.x.x 145.47 KiB 4 56.3% > 7d089351-077f-4c36-a2f5-007682f9c215 rack1 > UN x.x.x.x 122.51 KiB 4 55.5% > 625dafcb-0822-4c8b-8551-5350c528907a rack1 > UN x.x.x.x 127.53 KiB 4 88.2% > c64c0ce4-2f85-4323-b0ba-71d70b8e6fbf rack1 > > Thanks, > -- ec >